LLM’s on Ada Cluster – CQU eResearch

Large Language Modules / Natural Language Processing on the HPC System CPU or GPU modules

On the Ada Lovelace HPC facility, we have access to leading edge GPU’s (Nvidia H100’s), in which are ideal for Large Language modules. In this document we will supply examples to run LLM’s on both “GPU” and “CPU only” so you can hopefully get your LLM model running.

It should be noted that the Ada Lovelace cluster only has 6 GPU’s available as compared to the 2,304 hyperthreaded CPU cores. This is important to note as the HPC is a shared resource and when it comes time for you to use the HPC, it’s much more likely that there will be a queue for the GPUs, compared to the CPUs. If the wait time to use the hardware exceeds the amount of time the GPUs save your program in runtime, then this completely negates the benefit of using GPUs in the first place. Of course, you may not know how much faster the GPUs will perform when compared to the CPUs.

What Large Language Models can I use?

At the writing of this document, the following Large Language Models have been successfully tested (based on sourcing the models from hugging face – https://huggingface.co/models) . It is expected that more will be tested in future and this will be documented when available.

Phi-3-mini-128k-instruct – https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
Phi-3-medium-128k-instruct – https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
Phi-3-mini-4k-instruct – https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
Phi-3-medium-4k-instruct – https://huggingface.co/microsoft/Phi-3-medium-4k-instruct
Meta-Llama-3-8B-Instruct – https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Meta-Llama-3-70B-Instruct – https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
openai/whisper-large-v2 – https://huggingface.co/openai/whisper-large-v3

Note: You will need to request access if you are wanting to use the Meta Large Language Models.

What HPC software modules should I use?

At present, there are only one custom HPC software module that has been developed to run these Large Language Models, but it is expected that more might be created in future hen when further LLM’s are tested. You can load the LLM module using the following command;

module load Python/3.9.6-GCCcore-11.2.0-llm

Then you have the choice of using of the following Models:

Custom HPC Module Name	How to load it	Works with the following LLM’s	General Comments
CPU Only	`module load Python/3.9.6-GCCcore-11.2.0-llm`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	This is the first Python environment that was created to work with LLM’s on the HPC system.
GPU (required to run the LLM on a GPU)	`module load CUDA/12.8.0 module load Python/3.9.6-GCCcore-11.2.0-llm`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	You will need to load both of these HPC software modules if you want to try using LLM’s with GPU’s. This is the first Python environment that was created to work with LLM’s on the HPC system.

If you require additional python models installed for a particular LLM modules you are trying, you have two options:

Request additional python module to be installed on a HPC software python stack
Install the module to a Custom Conda Environment.
Install the python packages locally, this can be done following the instructions found at running-python-on-hpc, in which the section “Installing private modules / packages for individual use” section documents the process on what to do.

The following sections will provide example Slurm submission scripts and python code to run some of the popular Large Language Model’s on CQUniversity’s HPC system

Slurm scripts are submission scripts designed to submit a job to the CQUniversity’s HPC facility, in which will run as soon as the resources are available.

Microsoft Phi-3 (CPU only)

In this example, we have created the directory/folder “/LLM” in the users home directory. You may which to change this to an alternative location. Ensure you have changed into this “/LLM” location before creating the following files.

Python Code example

Create the following file phi3-test.py Note, you can change model_id to one of the following:

model_id=”microsoft/Phi-3-mini-128k-instruct”
model_id=”microsoft/Phi-3-medium-128k-instruct”
model_id=”microsoft/Phi-3-mini-4k-instruct”
model_id=”microsoft/Phi-3-medium-4k-instruct”

import torch
import transformers

model_id="microsoft/Phi-3-mini-128k-instruct"

torch.random.manual_seed(0)

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="cpu",
)

messages = [
{"role": "user", "content": """write a hello world program in python"""},
{"role": "assistant", "content": """write a hello world program in python"""},
]

pipe = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)

generation_args = {
"max_new_tokens": 1024,
"return_full_text": False,
# "temperature": 0.2,
"do_sample": False,
}

output = pipe(messages, **generation_args)

print(output[0]['generated_text'])

HPC Slurm Submission script

We will create a generic HPC submission script phi3-test.slurm, that we can use on the CQUniversity HPC facility. This script will submit the python file phi3-test.py that is located in the /home/<username>/LLM/ directory to the HPC scheduler. The submission script has asked for 48 CPU cores and 200gb’s of Memory.

####select resources #####
#SBATCH -J phi-3
#SBATCH --cpus-per-task=48
#SBATCH --mem=300g

#### Output File #####
#SBATCH -o output_%j.log

#### Error File #####
#SBATCH -e error_%j.log

##### Partition #####
#SBATCH -p workq

##### Change to current working directory #####
cd /home/<username>/LLM/

##### Execute Program #####
module load Python/3.9.6-GCCcore-11.2.0-llm
python phi3-test.py

Once both the python script “phi3-test.py” and submission script “phi3-test.slurm” has been created and “<username>/” has been replaced with your own HPC username, you can submit the job using the following command.

sbatch phi3-test.slurm

Once the program has computed, the output should be viewable in the following files: /home/<username>/LLM/output_*J%*.log /home/<username>/LLM/error_*J%*.log

Meta-Llama (CPU only)

Python Code example

Create the following file llama3-test.py Note: To access the Meta Llama3 model, you will need to register to access this model on hugging face. Visit https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct to register. Once you have been registered, you will need to changed this line “access_token = "<enter your hugging face access token here."” to include your access_token. See https://huggingface.co/docs/hub/security-tokens for help on how to get your Access Token working. Note, you can change model_id to one of the following:

model_id=”meta-llama/Meta-Llama-3-8B-Instruct”
model_id=”meta-llama/Meta-Llama-3-70B-Instruct”

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Define model and token
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
access_token = ""

# Set device
torch.set_default_device("cpu")

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=access_token)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    use_auth_token=access_token
)

# Set up the text-generation pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="cpu"
)

# Provide the prompt
prompt = "Write a Hello World program in Python."

# Generate response
output = generator(
    prompt,
    max_new_tokens=100,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    return_full_text=False
)

# Print result
print(output[0]["generated_text"])

HPC Slurm Submission script

We will create a generic HPC submission script llama3-test.slurm that we can use on the CQUniversity HPC facility. This script will submit the python file phi3-test.py that is located in the /home/<username>/LLM/ directory to the HPC scheduler. The submission script has asked for 1 CPU core and 50gb’s of Memory.

#### select resources #####
#SBATCH -J lla-3
#SBATCH --cpus-per-task=1
#SBATCH --mem=50g

#### Output File #####
#SBATCH -o output_%j.log

#### Error File #####
#SBATCH -e error_%j.log

##### Partition #####
#SBATCH -p workq

##### Change to current working directory #####
cd /home/<username>/LLM/

##### Execute Program #####
module load Python/3.9.6-GCCcore-11.2.0-llm
python llama3-test.py

Once both the python script “llama3-test.py” and submission script “llama3-test.slurm” has been created and “<username>/” has been replaced with your own HPC username, you can submit the job using the following command.

sbatch lla3-test.slurm

Once the program has computed, the output should be viewable in the following files: /home/<username>/LLM/llama3-test.out /home/<username>/LLM/llama3-test.err

OpenAI Whisper (CPU only)

Still being installed, this will be updated after it is tested.

Tips for using GPU’s

To use the GPU”s you’ll need to;

Select a Partition that has a GPU.
Tell Slurm how many GPU’s you’ll need, usually 1.
and load the Drivers for the GPU.

For Partitions, ‘gpucomputeq’ will be the best preforming and largest partition available. These options go in the “#SBATCH -p” option.

To add the command to Slurm, you’ll need to add the line “#SBATCH -G 1“.

Modules that are only compatible for the GPU nodes won’t come up when using module avail on a regular node. To see the full list of software you’ll need to be on a GPU interactive sessions or run a small sbatch script to put the command non-interactively on the H100.

How to get help

If you need help with anything on the HPC, the most direct way to get help is to contact HPC support via eresearch@cqu.edu.au.

Alternatively there is a Microsoft Teams team dedicated to the CQU HPC community, currently we have 60 members! We also run a weekly Hacky Hour over Zoom where you can ask any questions relating to the HPC, eResearch or simply want to attend to get involved, you don’t need to ask any questions all attendance is welcome! Further details on both the MS Teams site and the Hacky Hour can be found here.

Large Language Modules / Natural Language Processing on the HPC System CPU or GPU modules

What Large Language Models can I use?

What HPC software modules should I use?

Custom HPC Module Name

How to load it

Works with the following LLM’s

General Comments

The following sections will provide example Slurm submission scripts and python code to run some of the popular Large Language Model’s on CQUniversity’s HPC system

Microsoft Phi-3 (CPU only)

Python Code example

HPC Slurm Submission script

Meta-Llama (CPU only)

Python Code example

HPC Slurm Submission script

OpenAI Whisper (CPU only)

Tips for using GPU’s

How to get help