Machine and Deep Learning
CPU or GPU modules
Before we get started with using modules on the HPC, a good thing to consider is whether you should be using CPU or GPU enabled modules. This guide won’t go into depth about the differences but an extremely basic overview of the two would be CPU enabled modules are extremely versatile and capable of performing almost any task or calculation. GPU enabled modules aren’t as versatile but given the right circumstances can rapidly outperform CPUs.
Before you make a decision, we would recommend keeping in mind that the Marie Curie cluster only has 4 GPU’s available as compared to the 528 CPU cores. This is important to note as the HPC is a shared resource and when it comes time for you to use the HPC, it’s much more likely there will be a queue for the GPUs as compared to the CPUs. If the wait time to use the hardware exceeds the amount of time the GPUs save your program in runtime then this completely negates the benefit of using GPUs in the first place. Of course you may not know how much faster the GPUs will perform when compared to the CPUs. It is entirely possible that the GPUs may perform at the same speed or in rare cases even slower than their CPU counterparts. However, the GPUs are expected to save you hours or days in computing time in the vast majority of use cases. This is why we recommend you try running your code on both CPU and GPU nodes to find out which works for you.
Checking What Modules are Already Loaded
We’ve tried to make the barrier for entry as low as possible for users therefore even if this is a fresh shell there are popular modules loaded by default. You can see what is currently loaded by using the module list command (more module commands can be found here).
In the example below you can see the default modules that are currently loaded by using the module list command
[kenzlerb@marie ~]$ module list | |
1) scripts 2) user-scripts 3) mathematica 4) matlab 5) python 6) python3 7) netcdf 8) R 9) jdk 10) ansys 11) gsl 12) blast 13) abaqus 14) geos 15) gdal 16) udunits 17) rstudio 18) ea-utils 19) fastqc 20) velvet 21) metavelvet 22) velvetoptimiser 23) bowtie2 |
24) samtools |
Searching for Modules
You can see the full list of available modules by using the module avail command. You can also expand off this command by adding the name of the module you are looking for e.g. module avail PyTorch (please note Linux is case sensitive, so keep this in mind when searching for modules).
e.g.
[kenzlerb@marie ~]$ module avail PyTorch | ||||
PyTorch/1.10.0-fosscuda-2020b PyTorch/1.12.0-foss-2022a PyTorch/1.12.0-foss-2022a-CUDA-11.7.0 PyTorch/1.7.1-fosscuda-2020b |
PyTorch/1.8.1-foss-2020b PyTorch/1.8.1-fosscuda-2020b PyTorch/1.9.0-fosscuda-2020b |
PyTorch3D/0.4.0-fosscuda-2020b-PyTorch-1.7.1 PyTorch-Lightning/1.7.7-foss-2022a PyTorch-Lightning/1.8.4-foss-2022a |
If you’re not getting any results with the above, it doesn’t necessarily mean the HPC doesn’t have this software installed, as it could also be a syntax error with your search.
As an example the below commands will not retrieve any results whilst looking for PyTorch e.g.
module avail pytorch
module avail Pytorch
module avail torch
Alternatively we can use a script that Jason Bell wrote. Not only will this script ignore case but it will also tell us what version of Python the desired module is installed within, which is something we’ll need to know when loading the modules. The script can be ran with the following command check_python_module.sh followed by the module name e.g.
check_python_module.sh TensorFlow | |
/apps/modules/modulefiles/python-3.7.4 /apps/modules/modulefiles/python-3.7.6 /apps/modules/modulefiles/python-3.8.0 /apps/modules/modulefiles/python-3.8.2-deep-learning tensorflow 2.11.0 tensorflow-estimator 2.11.0 tensorflow-gpu 2.2.0rc3 tensorflow-io-gcs-filesystem 0.31.0 |
Please note TensorFlow is available on many more versions of Python. All of which are visibile with this command however for simplicity’s sake in this example I have chosen to show only 4 versions of Python. |
As you can see from the above, the script will print the directory of each Python version line by line and if it finds the module you have searched for within that version of Python, it will print it below. So from the example above we can see that TensorFlow is NOT installed within Python-3.7.4, Python-3.7.6 and Python-3.8.0 but it is installed within Python-3.8.2-deep-learning.
It may also be useful to know the installation locations of the modules on the HPC, therefore I have listed them below:
/apps/modules/modulefiles – (This is the directory for all modules installed without Easybuild)
/apps/easybuild/modules/all – (This is the directory for al lthe modules installed with Easybuild)
Note: you do not need to have any knowledge of Easybuild for this guide, just know there is two different directory locations for the modules installed on Marie Curie.
Loading the Correct Modules
Now that we know the modules we want to use are installed and what version of Python they are installed within, it is time to load them. I would suggest unloading the default Python modules first to avoid compatability issues. We can do this with the following command module unload python python3
Now we load the version of Python we want to use with the command module load, for this example we’ll use the deep learning package that supports TensorFlow that we discovered in the previous example.
module load python-3.8.2-deep-learning
Now we can load our choice of IDE application, for this example I’ll load Spyder with the command spyder
Please note you do not need to run a third party application such as Spyder to check if modules are loaded correctly. You can also do this by writing a script and then running it through the console but as this is a getting started guide, I believe it is easier to start and test in Spyder.
We can check that the modules are loaded and able to be imported by running the following script in Spyder.
import MODULENAME
print(MODULENAME.__version__)
e.g.
import tensorflow
print(tensorflow.__version__)
Output: 2.11.0
This script prints the version of the module if it has been successfully loaded.
Recommended Modules
If you’re just starting out in Machine Learning, we would recommend looking at deep-learning-packages3 first. This package has been specifically built to accomodate the majority of our users. This package will load a variety of the popular ML modules and their dependancies.
If you’ve been in the Machine Learning space for longer or are using older module versions that require Python2, then check out deep-learning-packages and deep-learning-packages2.
The following list highlights the popular modules and which packages they can be found within. If there is no package currently available we have suggested a compatible Python version or the command to load the module. We plan to develop more packages in the future for our users. A CPU focused package is currently in development.
CPU compatible | GPU compatible | |
TensorFlow | python-3.8.2-deep-learning | deep-learning-packages3-gpu |
Keras | python-3.8.2-deep-learning | deep-learning-packages3-gpu |
PyTorch | python-3.8.2-deep-learning | deep-learning-packages3-gpu |
DarkNet | Not contained in any packages, can be loaded individually withdarknet-21-02-2018-CPU-OPENMP | Not contained in any packages, can be loaded individually withdarknet-21-02-2018-GPU-OPENMP-CUDNN-OPENCV |
OpenCV | deep-learning-packagesdeep-learning-packages2 |
How to get unavailable modules
If you are wanting to use modules that are unavailable on the HPC you have two options:
- Contact HPC support via eresearch@cqu.edu.au
- Create your own Conda environment and install the modules needed, this option is explored in detail here.
How to get help
If you need help with anything on the HPC, the most direct way to get help is to contact HPC support via eresearch@cqu.edu.au.
Alternatively there is a Microsoft Teams team dedicated to the CQU HPC community, currently we have 60 members! We also run a weekly Hacky Hour over Zoom where you can ask any questions relating to the HPC, eResearch or simply want to attend to get involved, you don’t need to ask any questions all attendance is welcome! Further details on both the MS Teams site and the Hacky Hour can be found here.