Basic SLURM Commands
This section introduces the most commonly used SLURM commands on the GRAVITON cluster. These commands allow users to submit jobs, monitor their execution, check the state of the cluster, and manage job queues effectively.
Job Monitoring
Check the status of your jobs:
squeue -u $(whoami)
Displays all jobs you have submitted that are currently running or pending in the queue.
Check detailed job information:
scontrol show job <job_id>
Provides in-depth details about a specific job, such as its node allocation, memory usage, submission time, and more.
Cancel a job:
scancel <job_id>
Cancels a job you have submitted.
Cluster Status
While the SLURM command sinfo
is commonly used to view the status of nodes and partitions, in GRAVITON it may produce confusing output due to overlapping partition configurations.
Instead, we recommend using the custom utility grstatus
, which provides a clean and human-readable summary of resource usage across the cluster.
Check CPU and memory availability on all nodes:
grstatus
This command displays the number of allocated and total CPUs, as well as memory usage (in MB), for each compute node currently available in GRAVITON.
Example output:
NODE ALLOC_CPU TOTAL_CPU ALLOC_MEM REAL_MEM grwn01 8 56 20480 245000 grwn02 0 56 0 245000 somcosmo01 16 96 8192 365000
This tool is especially useful to quickly assess node availability before submitting your jobs, and is updated in real time based on SLURM metrics.
Job Submission
Submit a job using a batch script:
sbatch my_job_script.sh
Submits a batch job for execution. The script must contain SLURM directives (e.g., #SBATCH –time=…) and commands to run.
Run a job interactively:
srun --pty bash
Opens an interactive shell on a compute node, useful for testing or debugging.
Run a quick test job:
srun hostname
Executes a short command (e.g., hostname) on a compute node and returns the result.
Queue Information
See how busy the system is:
squeue
Lists all jobs currently running or pending on the cluster (not filtered by user).