Machine and Deep Learning on Ada
CPU or GPU modules
Before we get started with using modules on the HPC, a good thing to consider is whether you should be using CPU or GPU enabled modules. This guide won’t go into depth about the differences but an extremely basic overview of the two would be CPU enabled modules are extremely versatile and capable of performing almost any task or calculation. GPU enabled modules aren’t as versatile but given the right circumstances can rapidly outperform CPUs.
Before you make a decision, we would recommend keeping in mind that the Marie Curie cluster only has 6 GPU’s available as compared to the 2,304 hyperthreaded CPU cores. This is important to note as the HPC is a shared resource and when it comes time for you to use the HPC, it’s much more likely there will be a queue for the GPUs as compared to the CPUs. If the wait time to use the hardware exceeds the amount of time the GPUs save your program in runtime then this completely negates the benefit of using GPUs in the first place. Of course you may not know how much faster the GPUs will perform when compared to the CPUs. It is entirely possible that the GPUs may perform at the same speed or in rare cases even slower than their CPU counterparts. However, the GPUs are expected to save you hours or days in computing time in the vast majority of use cases. This is why we recommend you try running your code on both CPU and GPU nodes to find out which works for you.
Searching for Modules
You can see the full list of available modules by using the module avail command. You can also expand off this command by adding the name of the module you are looking for e.g. module avail
If you’re not getting any results with the above, it doesn’t necessarily mean the HPC doesn’t have this software installed, as it could also be a syntax error with your search.
As an example, the below commands will not retrieve any results whilst looking for PyTorch e.g.
module avail pytorch module avail Pytorch module avail torch
Alternatively, we can use a script that Jason Bell wrote. Not only will this script ignore case but it will also tell us what version of Python the desired module is installed within, which is something we’ll need to know when loading the modules. The script can be ran with the following command check_python_module.sh followed by the module name e.g.
check_python_module.sh TensorFlow | |
Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu Python/3.12.3-GCCcore-13.3.0 Python/3.13.1-GCCcore-14.2.0 Python/3.9.6-GCCcore-11.2.0-bare Python/3.9.6-GCCcore-11.2.0-llm |
Please note TensorFlow is available on many more versions of Python. All of which are visibile with this command however for simplicity’s sake in this example I have chosen to show only 4 versions of Python. |
As you can see from the above, the script will print the directory of each Python version line by line and if it finds the module you have searched for within that version of Python, it will print it below. So from the example above we can see that TensorFlow is NOT installed within Python-3.7.4, Python-3.7.6 and Python-3.8.0 but it is installed within Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu.
Loading the Correct Modules
Now that we know the modules we want to use are installed and what version of Python they are installed within, it is time to load them.
With the command module load, for this example we’ll use the deep learning package that supports TensorFlow that we discovered in the previous example.
Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu
Now we can load our choice of IDE application, for this example I’ll load Spyder with the command spyder
Please note you do not need to run a third party application such as Spyder to check if modules are loaded correctly. You can also do this by writing a script and then running it through the console but as this is a getting started guide, I believe it is easier to start and test in Spyder.
We can check that the modules are loaded and able to be imported by running the following script in Spyder.
import MODULENAME print(MODULENAME.__version__)
e.g.
import tensorflow print(tensorflow.__version__) Output: 2.11.0
This script prints the version of the module if it has been successfully loaded.
Recommended Modules
If you’re just starting out in Machine Learning, we would recommend looking at Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu first. This package has been specifically built to accomodate the majority of our users. This package will load a variety of the popular ML modules and their dependancies.
The following list highlights the popular modules and which packages they can be found within. If there is no package currently available we have suggested a compatible Python version or the command to load the module. We plan to develop more packages in the future for our users. A CPU focused package is currently in development.
CPU compatible | GPU compatible | |
TensorFlow | Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu | |
Keras | Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu | |
PyTorch | Python/3.12.3-GCCcore-13.3.0-deep-learning-cpu |
How to get unavailable modules
If you are wanting to use modules that are unavailable on the HPC you have two options:
- Contact HPC support via eresearch@cqu.edu.au
- Create your own Conda environment and install the modules needed, this option is explored in detail here.
How to get help
If you need help with anything on the HPC, the most direct way to get help is to contact HPC support via eresearch@cqu.edu.au.
Alternatively there is a Microsoft Teams team dedicated to the CQU HPC community, currently we have 60 members! We also run a weekly Hacky Hour over Zoom where you can ask any questions relating to the HPC, eResearch or simply want to attend to get involved, you don’t need to ask any questions all attendance is welcome! Further details on both the MS Teams site and the Hacky Hour can be found here.