.. _mpi_jobs: MPI Jobs on GRAVITON ===================== GRAVITON supports MPI-based parallel execution for both **single-node** and **multi-node** jobs. Depending on the number of cores requested, the job will be automatically directed to the appropriate partition. Partition and QOS Mapping -------------------------- - Jobs requesting **56 cores or fewer** are considered **single-node** jobs. These jobs are automatically assigned to the ``serial`` partition. To run this type of job, you must specify one of the following QOS options: - ``cosmo`` - ``hep`` - ``std`` Each QOS corresponds to a specific user group or scientific domain. See the architecture section for details on their intended use and limits. - Jobs requesting **more than 56 cores** are treated as **multi-node jobs**, and must explicitly request the ``lattice`` QOS to access the ``parallel`` partition, which enables high-speed inter-node communication via **InfiniBand**. The system will determine the partition and resource allocation based on your QOS and core count. Users must **not manually specify ``--nodes`` or ``--partition``**, as these are automatically managed. Example 1: MPI Job on a Single Node ----------------------------------- This example runs an MPI job using up to 56 cores on a single node. .. code-block:: bash #!/bin/bash #SBATCH --job-name=mpi_single #SBATCH --output=slurm_logs/job_%j.out #SBATCH --error=slurm_logs/job_%j.err #SBATCH --qos=hep #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --time=01:00:00 export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH srun ./my_mpi_program **Explanation** - ``--ntasks=32``: 32 MPI processes will be launched on a single node. - ``--qos=hep``: Suitable for serial-partition jobs (≤ 56 cores). - ``srun``: Preferred launcher for MPI on GRAVITON. Example 2: MPI Job on Multiple Nodes ------------------------------------ This example shows how to launch a multi-node MPI job using more than 56 cores. .. code-block:: bash #!/bin/bash #SBATCH --job-name=mpi_multi #SBATCH --output=slurm_logs/job_%j.out #SBATCH --error=slurm_logs/job_%j.err #SBATCH --qos=lattice #SBATCH --ntasks=128 #SBATCH --cpus-per-task=1 #SBATCH --time=02:00:00 export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH srun ./my_mpi_program **Explanation** - ``--ntasks=128``: The job will span multiple nodes with 128 MPI processes. - ``--qos=lattice``: Required for access to the ``parallel`` partition, which enables InfiniBand-based high-performance communication across nodes. - ``srun``: Ensures tight integration with SLURM's resource allocation. Guidelines ---------- - Use ``--qos=hep`` for small/medium jobs that can fit within a single node (≤ 56 cores). - Use ``--qos=lattice`` for large-scale MPI jobs requiring multiple nodes. - Never manually specify ``--nodes`` or ``--partition``. - Use ``srun`` instead of ``mpirun`` for better SLURM integration. Using ``srun`` vs ``mpirun`` ------------------------------- On GRAVITON, jobs that use **MPI** can be launched either using ``srun`` (the native SLURM launcher) or ``mpirun`` (the MPI runtime). However, **it is strongly recommended to use ``srun`` whenever possible**, as it provides better integration with the SLURM scheduler and resource allocation system. Recommended usage: .. code-block:: bash srun ./my_program This command will automatically launch the correct number of processes based on the value of ``--ntasks`` and manage the communication environment accordingly. **Why prefer ``srun``?** - ``srun`` integrates directly with SLURM’s resource allocation. - It ensures that task placement, environment variables, and resource constraints are applied correctly. - It avoids potential mismatches between what SLURM allocated and what MPI tries to use. - It supports better logging and accounting within SLURM. **When to use ``mpirun``?** In some advanced cases, or for **programs compiled with specific MPI implementations** that tightly couple to their own runtime (`mpirun`/`mpiexec`), it may still be necessary to use: .. code-block:: bash mpirun ./my_program If using ``mpirun``, make sure it is from the **same OpenMPI version provided by GRAVITON**, and that your environment is correctly set: .. code-block:: bash export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:$PATH export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:$LD_LIBRARY_PATH In both cases, SLURM will still enforce the resources defined in your ``sbatch`` script. **Summary** +--------------+----------------------------------------------+ | Launcher | Recommended Use Case | +==============+==============================================+ | ``srun`` | Preferred for most MPI jobs on GRAVITON | +--------------+----------------------------------------------+ | ``mpirun`` | Use only if your program requires it or | | | has issues when launched via ``srun`` | +--------------+----------------------------------------------+