Basic SLURM Commands

This section introduces the most commonly used SLURM commands on the GRAVITON cluster. These commands allow users to submit jobs, monitor their execution, check the state of the cluster, and manage job queues effectively.

Job Monitoring

Check the status of your jobs:
```
squeue -u $(whoami)
```
Displays all jobs you have submitted that are currently running or pending in the queue.
Check detailed job information:
```
scontrol show job <job_id>
```
Provides in-depth details about a specific job, such as its node allocation, memory usage, submission time, and more.
Cancel a job:
```
scancel <job_id>
```
Cancels a job you have submitted.

Cluster Status

While the SLURM command sinfo is commonly used to view the status of nodes and partitions, in GRAVITON it may produce confusing output due to overlapping partition configurations.

Instead, we recommend using the custom utility grstatus, which provides a clean and human-readable summary of resource usage across the cluster.

Check CPU and memory availability on all nodes:

grstatus

This command displays the number of allocated and total CPUs, as well as memory usage (in MB), for each compute node currently available in GRAVITON.

Example output:

NODE         ALLOC_CPU  TOTAL_CPU  ALLOC_MEM  REAL_MEM
grwn01       8          56         20480      245000
grwn02       0          56         0          245000
somcosmo01   16         96         8192       365000

This tool is especially useful to quickly assess node availability before submitting your jobs, and is updated in real time based on SLURM metrics.

Job Submission

Submit a job using a batch script:
```
sbatch my_job_script.sh
```
Submits a batch job for execution. The script must contain SLURM directives (e.g., #SBATCH –time=…) and commands to run.
Run a job interactively:
```
srun --pty bash
```
Opens an interactive shell on a compute node, useful for testing or debugging.
Run a quick test job:
```
srun hostname
```
Executes a short command (e.g., hostname) on a compute node and returns the result.

Queue Information

See how busy the system is:
```
squeue
```
Lists all jobs currently running or pending on the cluster (not filtered by user).