Submission Evaluation Logic (Lua-based)

This section outlines the internal logic used in GRAVITON to validate and configure submitted jobs. A custom Lua script is integrated into SLURM’s job submission pipeline, enforcing site policies such as partition selection, memory allocation, and input validation.

This system ensures that all jobs conform to the resource management model of GRAVITON and provides a predictable, fair scheduling environment for all users.

Key Rules Enforced

  • Only users with a valid accounting group (som or com) are allowed to submit jobs.

  • Partition assignment is automatic, based on the selected QoS:

    • --qos=mpi → assigned to the parallel partition

    • Any other valid QoS → assigned to the serial partition

  • Memory per core is assigned automatically according to the partition:

    • serial partition: 3.8 GB per core

    • parallel partition: 4.3 GB per core

  • If the job includes --constraint=double_mem, memory per core is doubled accordingly.

Important

Although it may seem advantageous to submit jobs under the mpi QoS — since it assigns more memory per core (4.3 GB instead of 3.8 GB) — this QoS automatically routes your job to the parallel partition.

However, nodes somcosmoXX, which are significantly more powerful than the standard grwnXX nodes, are not included in the parallel partition.

In contrast, all other QoS options (s6h, s24h) send jobs to the serial partition, which prioritizes somcosmoXX for job allocation.

Therefore, using mpi automatically excludes your job from accessing the most performant nodes available in GRAVITON.

Disallowed SLURM Options

When submitting jobs on GRAVITON, users must follow strict rules regarding SLURM directives. Many resource-related options are automatically managed by the system and must not be manually specified.

Warning

Users are not allowed to manually define any of the following SLURM options:

  • --nodes

  • --ntasks-per-node

  • --ntasks-per-socket

  • --mem / --mem-per-cpu

  • --partition

  • --cores-per-socket

These parameters are computed automatically based on the selected QoS and task count, ensuring uniform resource allocation and fair scheduling. Job scripts that include any of these options may be rejected, fail to execute, or be silently overridden by internal policies.

Allowed Directives

The following SLURM directives must be used to describe your job:

  • --qos=: Required. Selects the scheduling class and determines CPU/node limits and memory allocation.

  • --ntasks=: Required. Total number of MPI processes to launch.

  • --cpus-per-task=: Typically 1 (unless using hybrid MPI+OpenMP).

  • --time=: Required. Maximum wall clock time for the job.

    It may seem trivial, but accurately setting this time improves job scheduling. SLURM may prioritize your job by placing it in small scheduling windows. For example, if your job runs under 7 hours, specify:

    --time=7:00:00

  • --constraint=double_mem: Optional. Requests double memory per core. See the Memory Policy section for full details.

All other directives related to memory, node count, or placement are automatically handled by the scheduler and should be omitted.

Job Submission Flow Diagram

        graph TD
  A[Job submitted]
  B{Accounting group?}
  B1[Priority = Normal]
  B2[Priority = Normal]
  C{Past Member?}
  D{--mem? <br> --partition? <br> --nodes?}
  H{QoS?}
  P1[Priority = High]
  P2[Priority = Normal]
  P3[Priority = Low]
  I[Assign partition = parallel]
  J[Assign partition = serial]
  K1[Mem = 4.3 GB x CPU]
  K2[Mem = 3.8 GB  x CPU]
  L{--constraint=double_mem?}
  M[Mem = Mem x 2]
  N[Job accepted ✔]

  A --> B
  B -->|SOM| B1 --> C
  B -->|COM| B2 --> C
  B -->|NO| Z1[Reject ❌]
  C -- Yes --> Z2[Reject ❌]
  C -- No --> D
  D -- Yes --> Z2[Reject ❌]
  D -- No --> H
  H --> |mpi| P1 --> I --> K1 --> L
  H --> |s6h| P2 --> J
  H --> |s24h| P3 --> J
  J --> K2 --> L
  L -- Yes --> M --> N
  L -- No --> N
    

The diagram below illustrates the complete decision process that SLURM applies to submitted jobs via Lua scripting.