.. _slurm: Basic SLURM Commands ==================== This section introduces the most commonly used SLURM commands on the GRAVITON cluster. These commands allow users to submit jobs, monitor their execution, check the state of the cluster, and manage job queues effectively. Job Monitoring -------------- - **Check the status of your jobs**: .. code-block:: bash squeue -u $(whoami) Displays all jobs you have submitted that are currently running or pending in the queue. - **Check detailed job information**: .. code-block:: bash scontrol show job Provides in-depth details about a specific job, such as its node allocation, memory usage, submission time, and more. - **Cancel a job**: .. code-block:: bash scancel Cancels a job you have submitted. Cluster Status -------------- While the SLURM command ``sinfo`` is commonly used to view the status of nodes and partitions, in GRAVITON it may produce confusing output due to overlapping partition configurations. Instead, we recommend using the custom utility ``grstatus``, which provides a clean and human-readable summary of resource usage across the cluster. - **Check CPU and memory availability on all nodes**: .. code-block:: bash grstatus This command displays the number of allocated and total CPUs, as well as memory usage (in MB), for each compute node currently available in GRAVITON. Example output: .. code-block:: text NODE ALLOC_CPU TOTAL_CPU ALLOC_MEM REAL_MEM grwn01 8 56 20480 245000 grwn02 0 56 0 245000 somcosmo01 16 96 8192 365000 This tool is especially useful to quickly assess node availability before submitting your jobs, and is updated in real time based on SLURM metrics. Job Submission -------------- - **Submit a job using a batch script**: .. code-block:: bash sbatch my_job_script.sh Submits a batch job for execution. The script must contain SLURM directives (e.g., `#SBATCH --time=...`) and commands to run. - **Run a job interactively**: .. code-block:: bash srun --pty bash Opens an interactive shell on a compute node, useful for testing or debugging. - **Run a quick test job**: .. code-block:: bash srun hostname Executes a short command (e.g., `hostname`) on a compute node and returns the result. Queue Information ----------------- - **See how busy the system is**: .. code-block:: bash squeue Lists all jobs currently running or pending on the cluster (not filtered by user).