Sample Slurm GPU Submission Script

eResearch

SAMPLE SLURM GPU SUBMISSION SCRIPT

The following guide provides details on how to submit a simple program to execute on the GPU node.

In order to submit a job to the HPC system, it is recommended to write a script file similar to the one below, in which offers the benefit for the job to be re-submitted.

When it comes to picking the resources needed it’s worth keeping in mind the QoS limits on the HPC Cluster. The limits for the H100’s can be found in the table below. If you want to run 2 jobs on the H100’s halving the QoS limits for each job so they can both run at the same time.

Partition	gpucomputeq
CPU Cores	80
GPU’s	2
Memory	480 GBs
Wall time	24 Hours (1 day)

Note all “[…]” are variables that require defining.

Example Script (example.slurm)

#!/bin/bash
###### Select resources #####
#SBATCH -J [Name of Job]
#SBATCH -c [number of cpu's required, most likely 1]
#SBATCH --mem=[amount of memory required]G
#SBATCH -p [partition name]  ##gpucomputeq
#SBATCH ---gres=gpu:[amount of gpu's needed]
#SBATCH -t=[how long the job should run for in minutes]    ## You may wish to remove this line if the length of time required is unknown
#
#### Output File ##### 
#SBATCH -o [output_file].out    ## If left blank will default to slurm-[Job Number].out
#
#### Error File ##### 
#SBATCH -e [error_file].err     ## If left blank will default to slurm-[Job Number].err
#
##### Mail Options #####
#SBATCH --mail-type=ALL   ## BEGIN, END, FAIL, REQUEUE, STAGE_OUT, ALL, TIME_LIMIT_%% 
#SBATCH --mail-user=[your email address]
#
##### Change to current working directory #####
cd $SLURM_SUBMIT_DIR

##### Execute Program #####
./[program executable]

Real Example


#!/bin/bash
###### Select resources ######
#SBATCH -J Job1
#SBATCH -c 10
#SBATCH --mem=40GB
#SBATCH -p gpucomputeq
#SBATCH --gres=gpu:1
###### Output File ######
#SBATCH -o job1.out
###### Error File ######
#SBATCH -e Job1.err
###### Mail Options ######
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=l.decosta@cqu.edu.au

###### Change to current working directory ######
cd $SLURM_SUBMIT_DIR

###### Execute Program ######
module load Python/3.12.3-GCCcore-13.3.0
python ./myprogram.py

Executing script on the HPC System

To submit a job, simply login to one of the “login nodes” and execute the command on a terminal:

sbatch [slurm_script_file].slurm

Handy commands, to check if your job is running, queued or completed is by using one of the following commands:

squeue

squeue -u [username]