Running LLM’s on the HPC system

Large Language Modules / Natural Language Processing on the HPC System CPU or GPU modules

On the Marie Curie HPC facility, we only have access to old GPU’s (Nvidia P100’s), in which many of the Large Language modules will not run successfully on these GPU’s, due to not having enough onboard GPU Memory. In this document we will supply examples to run LLM’s on both “GPU” and “CPU only” so you can hopefully get your LLM model running.

It should be noted that the Marie Curie cluster only has 4 GPU’s available as compared to the 528 CPU cores. This is important to note as the HPC is a shared resource and when it comes time for you to use the HPC, it’s much more likely that there will be a queue for the GPUs, compared to the CPUs. If the wait time to use the hardware exceeds the amount of time the GPUs save your program in runtime, then this completely negates the benefit of using GPUs in the first place. Of course, you may not know how much faster the GPUs will perform when compared to the CPUs.

What Large Language Models can I use?

At the writing of this document, the following Large Language Models have been successfully tested (based on sourcing the models from hugging face – https://huggingface.co/models) . It is expect that more will be tested in future and this will be documented shortly.

Phi-3-mini-128k-instruct – https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
Phi-3-medium-128k-instruct – https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
Phi-3-mini-4k-instruct – https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
Phi-3-medium-4k-instruct – https://huggingface.co/microsoft/Phi-3-medium-4k-instruct
Meta-Llama-3-8B-Instruct – https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Meta-Llama-3-70B-Instruct – https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
openai/whisper-large-v2 – https://huggingface.co/openai/whisper-large-v3

Note: You will need to request access if you are wanting to use the Meta Large Language Models

What HPC software modules should I use?

At present, there are only a couple custom HPC software modules that have been developed to run these Large Language Models, but it is expected that more might be created in future hen when further LLM’s are tested.

The first thing to do is unload all the default loaded HPC software modules using the following command.

module purge

Then you have the choice of using of the following modules:

Custom HPC Module Name	How to load it	Works with the following LLM’s	General Comments
CPU Only Python/3.9.6-GCCcore-11.2.0-llm-phi-3	`module load Python/3.9.6-GCCcore-11.2.0-llm-phi-3`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	This is the first Python environment that was created to work with LLM’s on the HPC system.
CPU Only Python/3.9.6-GCCcore-11.2.0-jason	`module load Python/3.9.6-GCCcore-11.2.0-jason`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	This is a test Python environment that was created to work with LLM’s on the HPC system. This might be unstable as testing continues.
CPU Only Python/3.9.6-GCCcore-11.2.0-whisper	`module load FFmpeg/4.3.2-GCCcore-11.2.0 module load Python/3.9.6-GCCcore-11.2.0-whisper`	whisper-large-v2	This is a custom Python environment that was created to work with the Whisper LLM.
GPU Python/3.9.6-GCCcore-11.2.0-llm-phi-3 cuDNN/8.7.0.84-CUDA-11.8.0 (required to run the LLM on a GPU)	`module load cuDNN/8.7.0.84-CUDA-11.8.0 module load Python/3.9.6-GCCcore-11.2.0-llm-phi-3`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	You will need to load both of these HPC software modules if you want to try using LLM’s with GPU’s. This is the first Python environment that was created to work with LLM’s on the HPC system.
GPU Python/3.9.6-GCCcore-11.2.0-jason cuDNN/8.7.0.84-CUDA-11.8.0 (required to run the LLM on a GPU)	`module load cuDNN/8.7.0.84-CUDA-11.8.0 module load Python/3.9.6-GCCcore-11.2.0-jason`	Phi-3-mini-128k-instruct Phi-3-medium-128k-instruct Phi-3-mini-4k-instruct Phi-3-medium-4k-instruct Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct	You will need to load both of these HPC software modules if you want to try using LLM’s with GPU’s. This is a test Python environment that was created to work with LLM’s on the HPC system. This might be unstable as testing continues.

If you require additional python models installed for a particular LLM modules you are trying, you have two options:

Request additional python module to be installed on a HPC software python stack
Install the python packages locally, this can be done following the instructions found at running-python-on-hpc, in which the section “Installing private modules / packages for individual use” section documents the process on what to do.

The following sections will provide example PBS submission scripts and python code to run some of the popular Large Language Model’s on CQUniversity’s HPC system

PBS scripts are submission scripts designed to submit a job to the CQUniversity’s HPC facility, in which will run as soon as the resources are available.

Microsoft Phi-3 (CPU only)

In this example, we have created the directory/folder “/LLM” in the users home directory. You may which to change this to an alternative location.

Ensure you have changed into this “/LLM” location before creating the following files.

Python Code example

Create the following file phi3-test.py

Note, you can change model_id to one of the following:

model_id=”microsoft/Phi-3-mini-128k-instruct”
model_id=”microsoft/Phi-3-medium-128k-instruct”
model_id=”microsoft/Phi-3-mini-4k-instruct”
model_id=”microsoft/Phi-3-medium-4k-instruct”

import torch
import transformers

model_id="microsoft/Phi-3-mini-128k-instruct"

torch.random.manual_seed(0)

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="cpu",
)

messages = [
{"role": "user", "content": """write a hello world program in python"""},
{"role": "assistant", "content": """write a hello world program in python"""},
]

pipe = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)

generation_args = {
"max_new_tokens": 1024,
"return_full_text": False,
# "temperature": 0.2,
"do_sample": False,
}

output = pipe(messages, **generation_args)

print(output[0]['generated_text'])

HPC PBS Submission script

We will create a generic HPC submission script phi3-test.pbs, that we can use on the CQUniversity HPC facility.

This script will submit the python file phi3-test.py that is located in the /home/<username>/LLM/ directory to the HPC scheduler.

The submission script has asked for 24 CPU cores (maximum available CPU cores within a standard HPC compute node) and 150gb’s of Memory (maximum available memory within a standard HPC compute node).

####select resources #####
#PBS -N phi-3
#PBS -l ncpus=24
#PBS -l mem=150g

#### Output File #####
#PBS -o /home/<username>/LLM/phi3-test.out

#### Error File #####
#PBS -e /home/<username>/LLM/phi3-test.err

##### Queue #####
#pbs -q workq

##### Change to current working directory #####
cd /home/<username>/LLM/

##### Execute Program #####
. /etc/profile.d/modules.sh
module purge
module load Python/3.9.6-GCCcore-11.2.0-llm-phi-3 
python phi3-test.py

Once both the python script “phi3-test.py” and submission script “phi3-test.pbs” has been created and “<username>/” has been replaced with your own HPC username, you can submit the job using the following command.

qsub phi3-test.pbs

Once the program has computed, the output should be viewable in the following files:

/home/<username>/LLM/phi3-test.out

/home/<username>/LLM/phi3-test.err

Meta-Llama (CPU only)

In this example, we have created the directory/folder “/LLM” in the users home directory. You may which to change this to an alternative location.

Ensure you have changed into this “/LLM” location before creating the following files.

Python Code example

Create the following file llama3-test.py

Note: To access the Meta Llama3 model, you will need to register to access this model on hugging face. Visit https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct to register. Once you have been registered, you will need to changed this line “access_token = "<enter your hugging face access token here."” to include your access_token. See https://huggingface.co/docs/hub/security-tokens for help on how to get your Access Token working.

Note, you can change model_id to one of the following:

model_id=”meta-llama/Meta-Llama-3-8B-Instruct”
model_id=”meta-llama/Meta-Llama-3-70B-Instruct”

import torch
import transformers

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

torch.set_default_device('cpu')

access_token = "<enter your hugging face access token here."

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="cpu",
)

messages = [
{"role": "user", "content": """write a hello world program in python"""},
{"role": "assistant", "content": """write a hello world program in python"""
},
]

terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
messages,
max_new_tokens=1024,
return_full_text=False,
eos_token_id=terminators,
do_sample=True,
# temperature=0.6,
top_p=0.9,
)

print(outputs[0]["generated_text"])

HPC PBS Submission script

We will create a generic HPC submission script llama3-test.pbs that we can use on the CQUniversity HPC facility.

This script will submit the python file phi3-test.py that is located in the /home/<username>/LLM/ directory to the HPC scheduler.

#### select resources #####
#PBS -N llama-3
#PBS -l ncpus=24
#PBS -l mem=150g

#### Output File #####
#PBS -o /home/<username>/LLM/llama3-test.out

#### Error File #####
#PBS -e /home/<username>/LLM/llama3-test.err

##### Queue #####
#pbs -q workq

##### Change to current working directory #####
cd /home/<username>/LLM/

##### Execute Program #####
. /etc/profile.d/modules.sh
module purge
module load Python/3.9.6-GCCcore-11.2.0-llm-phi-3
python llama3-test.py

Once both the python script “llama3-test.py” and submission script “llama3-test.pbs” has been created and “<username>/” has been replaced with your own HPC username, you can submit the job using the following command.

qsub lla3-test.pbs

Once the program has computed, the output should be viewable in the following files:

/home/<username>/LLM/llama3-test.out

/home/<username>/LLM/llama3-test.err

OpenAI Whisper (CPU only)

In this example, we have created the directory/folder “/LLM” in the users home directory. You may which to change this to an alternative location.

Ensure you have changed into this “/LLM” location before creating the following files.

Python Code example

Create the following file phi3-test.py

Note, you can change model_id to one of the following:

model_id=”openai/whisper-large-v3″

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

result = pipe(sample)
print(result["text"])

HPC PBS Submission script

We will create a generic HPC submission script phi3-test.pbs, that we can use on the CQUniversity HPC facility.

This script will submit the python file phi3-test.py that is located in the /home/<username>/LLM/ directory to the HPC scheduler.

####select resources #####
#PBS -N whisper-test
#PBS -l ncpus=24
#PBS -l mem=150g

#### Output File #####
#PBS -o /home/<username>/LLM/whisper-test.out

#### Error File #####
#PBS -e /home/<username>/LLM/whisper-test.err

##### Queue #####
#pbs -q workq

##### Change to current working directory #####
cd /home/<username>/LLM/

##### Execute Program #####
. /etc/profile.d/modules.sh
module purge
module load FFmpeg/4.3.2-GCCcore-11.2.0
module load Python/3.9.6-GCCcore-11.2.0-whisper
python whisper-test.py

Once both the python script “whisper-test.py” and submission script “whisper-test.pbs” has been created and “<username>/” has been replaced with your own HPC username, you can submit the job using the following command.

qsub whisper-test.pbs

Once the program has computed, the output should be viewable in the following files:

/home/<username>/LLM/whisper-test.out

/home/<username>/LLM/whisper-test.err

How to get help

If you need help with anything on the HPC, the most direct way to get help is to contact HPC support via eresearch@cqu.edu.au.

Alternatively there is a Microsoft Teams team dedicated to the CQU HPC community, currently we have 60 members! We also run a weekly Hacky Hour over Zoom where you can ask any questions relating to the HPC, eResearch or simply want to attend to get involved, you don’t need to ask any questions all attendance is welcome! Further details on both the MS Teams site and the Hacky Hour can be found here.

Large Language Modules / Natural Language Processing on the HPC System CPU or GPU modules

What Large Language Models can I use?

What HPC software modules should I use?

Custom HPC Module Name

How to load it

Works with the following LLM’s

General Comments

The following sections will provide example PBS submission scripts and python code to run some of the popular Large Language Model’s on CQUniversity’s HPC system

Microsoft Phi-3 (CPU only)

Python Code example

HPC PBS Submission script

Meta-Llama (CPU only)

Python Code example

HPC PBS Submission script

OpenAI Whisper (CPU only)

Python Code example

HPC PBS Submission script

How to get help