MPI Jobs on GRAVITON

GRAVITON supports MPI-based parallel execution for both single-node and multi-node jobs. Depending on the number of cores requested, the job will be automatically directed to the appropriate partition.

Partition and QOS Mapping

  • Jobs requesting 56 cores or fewer are considered single-node jobs. These jobs are automatically assigned to the serial partition.

    To run this type of job, you must specify one of the following QOS options:

    • cosmo

    • hep

    • std

    Each QOS corresponds to a specific user group or scientific domain. See the architecture section for details on their intended use and limits.

  • Jobs requesting more than 56 cores are treated as multi-node jobs, and must explicitly request the lattice QOS to access the parallel partition, which enables high-speed inter-node communication via InfiniBand.

The system will determine the partition and resource allocation based on your QOS and core count. Users must not manually specify ``–nodes`` or ``–partition``, as these are automatically managed.

Example 1: MPI Job on a Single Node

This example runs an MPI job using up to 56 cores on a single node.

#!/bin/bash
#SBATCH --job-name=mpi_single
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=hep
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00

export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH

srun ./my_mpi_program

Explanation

  • --ntasks=32: 32 MPI processes will be launched on a single node.

  • --qos=hep: Suitable for serial-partition jobs (≤ 56 cores).

  • srun: Preferred launcher for MPI on GRAVITON.

Example 2: MPI Job on Multiple Nodes

This example shows how to launch a multi-node MPI job using more than 56 cores.

#!/bin/bash
#SBATCH --job-name=mpi_multi
#SBATCH --output=slurm_logs/job_%j.out
#SBATCH --error=slurm_logs/job_%j.err
#SBATCH --qos=lattice
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
#SBATCH --time=02:00:00

export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH


srun ./my_mpi_program

Explanation

  • --ntasks=128: The job will span multiple nodes with 128 MPI processes.

  • --qos=lattice: Required for access to the parallel partition, which enables InfiniBand-based high-performance communication across nodes.

  • srun: Ensures tight integration with SLURM’s resource allocation.

Guidelines

  • Use --qos=hep for small/medium jobs that can fit within a single node (≤ 56 cores).

  • Use --qos=lattice for large-scale MPI jobs requiring multiple nodes.

  • Never manually specify --nodes or --partition.

  • Use srun instead of mpirun for better SLURM integration.

Using srun vs mpirun

On GRAVITON, jobs that use MPI can be launched either using srun (the native SLURM launcher) or mpirun (the MPI runtime). However, it is strongly recommended to use ``srun`` whenever possible, as it provides better integration with the SLURM scheduler and resource allocation system.

Recommended usage:

srun ./my_program

This command will automatically launch the correct number of processes based on the value of --ntasks and manage the communication environment accordingly.

Why prefer ``srun``?

  • srun integrates directly with SLURM’s resource allocation.

  • It ensures that task placement, environment variables, and resource constraints are applied correctly.

  • It avoids potential mismatches between what SLURM allocated and what MPI tries to use.

  • It supports better logging and accounting within SLURM.

When to use ``mpirun``?

In some advanced cases, or for programs compiled with specific MPI implementations that tightly couple to their own runtime (mpirun/mpiexec), it may still be necessary to use:

mpirun ./my_program

If using mpirun, make sure it is from the same OpenMPI version provided by GRAVITON, and that your environment is correctly set:

export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH

In both cases, SLURM will still enforce the resources defined in your sbatch script.

Summary

Launcher

Recommended Use Case

srun

Preferred for most MPI jobs on GRAVITON

mpirun

Use only if your program requires it or has issues when launched via srun