MPI Jobs on GRAVITON
GRAVITON supports MPI-based parallel execution for both single-node and multi-node jobs. Depending on the number of cores requested, the job will be automatically directed to the appropriate partition.
Partition and QOS Mapping
Jobs requesting 56 cores or fewer are considered single-node jobs. These jobs are automatically assigned to the
serialpartition.To run this type of job, you must specify one of the following QOS options:
s6hs24hmpi
Each QOS corresponds to a specific user group or scientific domain. See the architecture section for details on their intended use and limits.
Jobs requesting more than 56 cores are treated as multi-node jobs, and must explicitly request the
mpiQOS to access theparallelpartition, which enables high-speed inter-node communication via InfiniBand.
The system will determine the partition and resource allocation based on your QOS and core count. Users must not manually specify ``–nodes`` or ``–partition``, as these are automatically managed.
Example 1: MPI Job on a Single Node
This example runs an MPI job using up to 56 cores on a single node.
#!/bin/bash
#SBATCH --job-name=mpi_single
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=s6h
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
srun ./my_mpi_program
Explanation
--ntasks=12: 12 MPI processes will be launched on a single node.--qos=s6h: Suitable for serial-partition jobs (≤ 56 cores).srun: Preferred launcher for MPI on GRAVITON.
Example 2: MPI Job on Multiple Nodes
This example shows how to launch a multi-node MPI job using more than 56 cores.
#!/bin/bash
#SBATCH --job-name=mpi_multi
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=mpi
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
#SBATCH --time=02:00:00
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
srun ./my_mpi_program
Explanation
--ntasks=128: The job will span multiple nodes with 128 MPI processes.--qos=mpi: Required for access to theparallelpartition, which enables InfiniBand-based high-performance communication across nodes.srun: Ensures tight integration with SLURM’s resource allocation.
Guidelines
Use
--qos=s6hfor small to medium jobs that can run within a single node, with up to 28 CPUs and a maximum runtime of 6 hours. This QoS has normal priority and is ideal for short workloads.Use
--qos=s24hfor single-node jobs requiring up to 56 CPUs and a maximum runtime of 24 hours. This QoS has low priority and is suitable for longer or more demanding tasks that still fit within a single node.Use
--qos=mpifor large-scale MPI jobs that need more than 56 CPUs across multiple nodes. This QoS has high priority but will place your job in theparallelpartition, which excludes access to the high-performancesomcosmoXXnodes.Never manually specify
--nodesor--partition. These are handled automatically by the system and will lead to job rejection or misbehavior if used.Always use
sruninstead ofmpirunto launch parallel jobs.srunensures proper SLURM integration and resource binding.
Using srun vs mpirun
On GRAVITON, jobs that use MPI can be launched either using srun (the native SLURM launcher) or mpirun (the MPI runtime). However, it is strongly recommended to use ``srun`` whenever possible, as it provides better integration with the SLURM scheduler and resource allocation system.
Recommended usage:
srun ./my_program
This command will automatically launch the correct number of processes based on the value of --ntasks and manage the communication environment accordingly.
Why prefer ``srun``?
srunintegrates directly with SLURM’s resource allocation.It ensures that task placement, environment variables, and resource constraints are applied correctly.
It avoids potential mismatches between what SLURM allocated and what MPI tries to use.
It supports better logging and accounting within SLURM.
When to use ``mpirun``?
In some advanced cases, or for programs compiled with specific MPI implementations that tightly couple to their own runtime (mpirun/mpiexec), it may still be necessary to use:
mpirun ./my_program
If using mpirun, make sure it is from the same OpenMPI version provided by GRAVITON, and that your environment is correctly set:
export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH
In both cases, SLURM will still enforce the resources defined in your sbatch script.
Summary
Launcher |
Recommended Use Case |
|---|---|
|
Preferred for most MPI jobs on GRAVITON |
|
Use only if your program requires it or
has issues when launched via |