MPI Jobs on GRAVITON
GRAVITON supports MPI-based parallel execution for both single-node and multi-node jobs. Depending on the number of cores requested, the job will be automatically directed to the appropriate partition.
Partition and QOS Mapping
Jobs requesting 56 cores or fewer are considered single-node jobs. These jobs are automatically assigned to the
serial
partition.To run this type of job, you must specify one of the following QOS options:
s6h
s24h
mpi
Each QOS corresponds to a specific user group or scientific domain. See the architecture section for details on their intended use and limits.
Jobs requesting more than 56 cores are treated as multi-node jobs, and must explicitly request the
mpi
QOS to access theparallel
partition, which enables high-speed inter-node communication via InfiniBand.
The system will determine the partition and resource allocation based on your QOS and core count. Users must not manually specify ``–nodes`` or ``–partition``, as these are automatically managed.
Example 1: MPI Job on a Single Node
This example runs an MPI job using up to 56 cores on a single node.
#!/bin/bash
#SBATCH --job-name=mpi_single
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=s6h
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
srun ./my_mpi_program
Explanation
--ntasks=12
: 12 MPI processes will be launched on a single node.--qos=s6h
: Suitable for serial-partition jobs (≤ 56 cores).srun
: Preferred launcher for MPI on GRAVITON.
Example 2: MPI Job on Multiple Nodes
This example shows how to launch a multi-node MPI job using more than 56 cores.
#!/bin/bash
#SBATCH --job-name=mpi_multi
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=mpi
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
#SBATCH --time=02:00:00
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
srun ./my_mpi_program
Explanation
--ntasks=128
: The job will span multiple nodes with 128 MPI processes.--qos=mpi
: Required for access to theparallel
partition, which enables InfiniBand-based high-performance communication across nodes.srun
: Ensures tight integration with SLURM’s resource allocation.
Guidelines
Use
--qos=s6h
for small to medium jobs that can run within a single node, with up to 28 CPUs and a maximum runtime of 6 hours. This QoS has normal priority and is ideal for short workloads.Use
--qos=s24h
for single-node jobs requiring up to 56 CPUs and a maximum runtime of 24 hours. This QoS has low priority and is suitable for longer or more demanding tasks that still fit within a single node.Use
--qos=mpi
for large-scale MPI jobs that need more than 56 CPUs across multiple nodes. This QoS has high priority but will place your job in theparallel
partition, which excludes access to the high-performancesomcosmoXX
nodes.Never manually specify
--nodes
or--partition
. These are handled automatically by the system and will lead to job rejection or misbehavior if used.Always use
srun
instead ofmpirun
to launch parallel jobs.srun
ensures proper SLURM integration and resource binding.
Using srun
vs mpirun
On GRAVITON, jobs that use MPI can be launched either using srun
(the native SLURM launcher) or mpirun
(the MPI runtime). However, it is strongly recommended to use ``srun`` whenever possible, as it provides better integration with the SLURM scheduler and resource allocation system.
Recommended usage:
srun ./my_program
This command will automatically launch the correct number of processes based on the value of --ntasks
and manage the communication environment accordingly.
Why prefer ``srun``?
srun
integrates directly with SLURM’s resource allocation.It ensures that task placement, environment variables, and resource constraints are applied correctly.
It avoids potential mismatches between what SLURM allocated and what MPI tries to use.
It supports better logging and accounting within SLURM.
When to use ``mpirun``?
In some advanced cases, or for programs compiled with specific MPI implementations that tightly couple to their own runtime (mpirun/mpiexec), it may still be necessary to use:
mpirun ./my_program
If using mpirun
, make sure it is from the same OpenMPI version provided by GRAVITON, and that your environment is correctly set:
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
In both cases, SLURM will still enforce the resources defined in your sbatch
script.
Summary
Launcher |
Recommended Use Case |
---|---|
|
Preferred for most MPI jobs on GRAVITON |
|
Use only if your program requires it or
has issues when launched via |