Resource Architecture and QoS Policies
This section describes the resource allocation architecture of GRAVITON, including how user groups, partitions, and QOS (Quality of Service) levels interact to determine job eligibility and priority.
User Accounts and Groups
Users in GRAVITON belong to one of the following accounting groups, which define their resource access and scheduling priority:
Account |
Description |
Priority |
---|---|---|
|
Subgroup SOM |
Normal |
|
Subgroup COM |
Normal |
Cada usuario está asociado a un único grupo. Esto es simplemente para la monitorización y las estadisticas, ya que no se aplica fairshare por grupo, solo a nivel de usuario.
Partitions and QoS Mapping
Resource usage in GRAVITON is controlled by a combination of partitions and QOS policies. Users do not select partitions directly; instead, the system automatically assigns jobs based on:
Requested number of tasks (
--ntasks
)Requested number of CPUs per task (
--cpus-per-task
)Selected QOS (
--qos
)Internal constraints defined in SLURM configuration
QOS |
Intended Use |
CPU Range |
Max Nodes per Job |
Max Time |
Partition |
Priority |
---|---|---|---|---|---|---|
|
Lightweight jobs (≤ 6h) |
1–28 |
1 |
6h |
|
Normal |
|
Medium serial/parallel jobs (≤ 24h) |
1–56 |
1 |
24h |
|
Low |
|
Large-scale MPI jobs (multi-node, ≥ 57 CPUs) |
≥ 57 |
∞ |
7 days |
|
High |
Partition Architecture
GRAVITON is composed of two main resource pools:
serial
: Intended for jobs requiring a single node. Cores are allocated from:grwn[01-21]: 56-core nodes with 200GbE Infiniband
somcosmo[01-02]: 96-core nodes with 25GbE Ethernet
parallel
: Designed for distributed jobs needing multiple nodes with high-speed interconnect. Only nodes with InfiniBand:grwn[01-21]
Each partition is isolated to optimize job scheduling and performance depending on the type of workload.
Policy Enforcement
To ensure consistency and fairness in job scheduling:
Users must not specify
--nodes
,--mem
, or--partition
.The QOS defines implicit limits on CPUs and node count.
Memory constraints are enforced dynamically using SLURM’s Lua plugin.
The system rejects jobs that violate QOS restrictions.