Basic SLURM Commands

This section introduces the most commonly used SLURM commands on the GRAVITON cluster. These commands allow users to submit jobs, monitor their execution, check the state of the cluster, and manage job queues effectively.

Job Monitoring

  • Check the status of your jobs:

    squeue -u $(whoami)
    

    Displays all jobs you have submitted that are currently running or pending in the queue.

  • Check detailed job information:

    scontrol show job <job_id>
    

    Provides in-depth details about a specific job, such as its node allocation, memory usage, submission time, and more.

  • Cancel a job:

    scancel <job_id>
    

    Cancels a job you have submitted.

Cluster Status

While the SLURM command sinfo is commonly used to view the status of nodes and partitions, in GRAVITON it may produce confusing output due to overlapping partition configurations.

Instead, we recommend using the custom utility grstatus, which provides a clean and human-readable summary of resource usage across the cluster.

  • Check CPU and memory availability on all nodes:

    grstatus
    

    This command displays the number of allocated and total CPUs, as well as memory usage (in MB), for each compute node currently available in GRAVITON.

    Example output:

    NODE         ALLOC_CPU  TOTAL_CPU  ALLOC_MEM  REAL_MEM
    grwn01       8          56         20480      245000
    grwn02       0          56         0          245000
    somcosmo01   16         96         8192       365000
    

This tool is especially useful to quickly assess node availability before submitting your jobs, and is updated in real time based on SLURM metrics.

Job Submission

  • Submit a job using a batch script:

    sbatch my_job_script.sh
    

    Submits a batch job for execution. The script must contain SLURM directives (e.g., #SBATCH –time=…) and commands to run.

  • Run a job interactively:

    srun --pty bash
    

    Opens an interactive shell on a compute node, useful for testing or debugging.

  • Run a quick test job:

    srun hostname
    

    Executes a short command (e.g., hostname) on a compute node and returns the result.

Queue Information

  • See how busy the system is:

    squeue
    

    Lists all jobs currently running or pending on the cluster (not filtered by user).