Chapter 7 High Performance Computing at UQ
7.1 UQ’s Research Computing Centre
The Research Computing Centre (RCC) provides coordinated management and support of The University of Queensland’s sustained and substantial investment in eResearch. The RCC helps UQ researchers across disciplines make the most of the University’s eResearch technologies, such as High Performance Computing (HPC), data storage, data management, visualisation, workflow and videoconferencing.
Specifically for the MME, the RCC coordinates access and support for 3 HPCs and the Research Data Manager (RDM).
7.1.1 HPC access
Go to https://www.qriscloud.org.au/ and register for an account. Try to log in with your staff account, which may require logging out of any UQ websites. This step is automated, and you will be granted access immediately.
You must apply separately for each HPC server you want to access, providing a “technical justification”. You apply at https://www.qriscloud.org.au/.
Tinaroo does not appear on the QRIScompute request page, go here.
The 3 HPCs under the RCC banner are:
FlashLite
The technical justification for FlashLite is:
- You need to read and write vast amounts of data, such as CMIP6 data.
- You keep running out of memory on Tinaroo or Awoonga
- David Abramson has asked us to demonstrate the BeeGFS filesystem caching enhancements
An example submission made by Phil Dyer. Access was approved very quickly.
MathMarEcol Bioregionalisation on FlashLite
My lab is working on large scale marine ecology and climate models. The input data is around 30-50TB. David Abramson has asked us to use FlashLite to demostrate the BeeGFS filesystem caching enhancements, which is only available on FlashLite.
Tinaroo
The technical justification for Tinaroo is:
- Your code runs faster when you can use more nodes (MPI)
- Need a GUI, such as with MATLAB
Using more nodes means your code runs across more than one computer. If you are just using multiple cores within your computer, then Awoonga is a better choice.
Awoonga
The technical justification for Awoonga is:
- Your code is written to use multiple cores within your computer
- You are doing embarrassingly parallel work or parameter sweeps, where each run is independent
The Zooplankton size spectrum modelling is embarrassingly parallel, because each grid cell can ignore surrounding grid cells. Computing GCMs would not be embarrassingly parallel, because you need to know the neighbouring grid cells to decide your next state, at every time step..
An example submission made by Phil Dyer, approved within 5 minutes.
MathMarEcol bioregionalisation
My code currently works in a shared memory mode within a single node. Most of the computations are embarrassingly parallel. I run parameter sweeps across different datasets as well, which can be run independently on separate nodes. I can make use of 20-40+ cores, and typically need about 5 to 10 GB per core within a run.
7.1.1.1 Network Access
You must be on the UQ intranet to access the servers. If you are connected to eduroam/uq cable/uq wifi, then you are already in the intranet.
Everywhere else you will need to install the UQ VPN software and log in. Follow the instructions at:
7.1.1.2 The terminal
To access any of the HPC’s, you will need a terminal program on your computer.
On mac, this is Terminal.
On windows, this is Command Prompt or Windows Powershell (not Windows Powershell ISE). If you don’t have an up to date Windows 10 machine, install PuTTy.
Some people like the Ubuntu app in windows 10, just remember the mnt/c syntax to get to the files on your computer.
7.1.2 Logging in
Once you have your terminal open
Your username and password are the same as your UQ login.
Cluster | Login Nodes | Cluster Login Alias |
---|---|---|
Awoonga | awoonga1.rcc.uq.edu.au | awoonga.rcc.uq.edu.au |
FlashLite | flashlite1.rcc.uq.edu.au | flashlite.rcc.uq.edu.au |
flashlite2.rcc.uq.edu.au | ||
Tinaroo | tinaroo1.rcc.uq.edu.au | tinaroo.rcc.uq.edu.au |
tinaroo2.rcc.uq.edu.au | ||
Wiener | wiener.hpc.dc.uq.edu.au | wiener.hpc.dc.uq.edu.au |
7.1.3 Submitting a Job
Add section on submitted a job via PBS to Awoonga/Flashlite
For the moment, you can get more information here. NOTE: You need to VPN in if not on campus.
7.2 SMP High Performance Computing
http://research.smp.uq.edu.au/computing/getafix.html
The School of Mathematics and Physics have their own HPC called Getafix. Getafix is a cluster and will assign your job to nodes using the SLURM program.
7.2.1 Getafix Structure
Each computer in the cluster is called a node. Each node has it’s own cpu’s, memory and graphics card. Nodes all share a common network drive, so all your data is available on all the nodes. Most nodes in Getafix have about 24 cores, a few have more and a few have less. Each node also typically has about 250GB of memory.
One node is special, the login node. The login node works like any remote computer, you can run jobs on the login node, but out of respect for other users, you should only do setup operations and short tests on the login node.
7.2.2 SLURM
SLURM is the program that balances all the jobs between users across the cluster. You specify what you want the cluster to do, and how much time, nodes, cpu’s and memory you think you will need. SLURM then waits until the cluster has enough resources available to run your job and starts the job. If you try to use more resources than you asked for, your job will be killed. If you ask for lots of resources, it might take a while for enough nodes to be available, since smaller jobs will usually fill up the available resources as other jobs end.
7.2.3 Other Resources
The cluster has access to shared resources, particularly shared storage and networking, so your data is available on all the nodes at the same location.
7.2.4 Getting into Getafix
7.2.4.1 Request Access
Email ITS at help@its.uq.edu.au and request access to the SMP cluster which is called Getafix.
This can take a few days. Make sure you explain your connection to the School of Maths and Physics. For HDR students, you will often have both a staff and student account. Let ITS know which one you want to use to log in with.
7.2.4.2 Logging into Getafix
Once you have your terminal open and logged into the UQ VPN (if off campus), type:
Where <username>
is your staff or student login.
If a popup appears saying "Authenticity can't be established"
. Say yes.
Enter your password.
If you see:
> ssh: Could not resolve hostname getafix.smp.uq.edu.au: This is usually
> a temporary error during hostname resolution and means that the local
> server did not receive a response from an authoritative server.
Then you either typed it incorrectly or are not on the UQ internal network, see Network Access.
If you really want R X11 plots to appear on your desktop, add -X
to the login. This works on Mac and Linux. Windows may need an X11 server, which you can most easily get from the Ubuntu app. I strongly recommend
saving plots into files and copying them to your computer, X11 can be very slow to render.
7.2.5 Putting your data on Getafix
You need to have your data on Getafix. Where to put it?
7.2.5.1 Home
Which is often shortened to
You have 40GB of data allowed in here. The home directory is often used for programs and configuration files. R libraries will install here by default, which is fine. Don’t use the home directory for data sets, results and plots.
7.2.5.2 Data
You have 400GB of data allowed in here. Put your data sets and scripts in the /data/username/
directory.
7.2.5.3 Data1, Data2 … Data6
Most of the time you use /data/
but there are other data folders that
you can access on request.
data1
and data2
are older system drives that we are not using.
data3
to data6
are high speed access drives for very large file
reading and writing. These could be useful for work on GCM’s.
7.2.6 Sending your data to getafix
7.2.7 Running jobs on Getafix
7.2.7.1 Modules
To allow different versions of software to be used by different users (like python 3.4 versus python 3.6), Getafix uses Linux modules. On Getafix, R is in a module, and won’t work until you load the module. You can use the default version of a module, or specify a particular version of a module if your job depends on it.
To see what modules are available, type:
When you see a module you want to use, load it with:
I use the following modules:
7.2.7.3 Installing packages
Install packages to a local library in your home directory.
Accept the default library location unless you have a reason not to.
If your package fails to install, check if there is a system package missing.
For example, to install sf
, you need to load more modules:
Now sf
should install.
If it still does not work, send an email to help@its.uq.edu.au, they have been helpful setting me up when something was missing or broken.
7.2.7.4 Parameters for Rmd files
You can pass parameters into Rmd files from the command line. You need to set up the `params` in the YAML header, see https://bookdown.org/yihui/rmarkdown/parameterized-reports.html
Using Rscript, you can render a report using a specified set of parameters like this:
paramlist='list(nCores =24, gRange = 1:300 )'
Rscript -e 'knitr::opts_knit$set(progress = TRUE, verbose = TRUE); rmarkdown::render("/data/uqpdyer/demo.Rmd", params = '"${paramlist}"', output_format = "html_notebook", output_file = "outfile.html", output_dir = "/data/uqpdyer", envir = new.env())'
This is not as convenient as setting parameters in RStudio or within an R session. You have to be very careful about quotes, and the command line is hard to edit.
Sending parameters to an R script in bulk is much easier, but lacks the reproducability of Rmd.
7.2.7.5 SLURM
So far, R has been running on the login node. Some things are easiest to do on the login node, like installing packages and transferring files. Computationally heavy tasks should be done on other nodes. To use the other nodes, you submit jobs with SLURM.
SLURM is a job scheduler. People request jobs by specifying the resources they need and the scripts to run, and SLURM queues the jobs until some nodes are available to run the job. You can also request an interactive session, but it is more efficient to set up jobs and come back to get the results.
7.2.7.5.1 sbatch
The command sbatch
is used to submit jobs, which are run when the resources become available.
You use SLURM by calling sbatch
:
where script.sh
looks like
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-00:10
echo "hello world"
The #SBATCH
lines at the top are parameters for the sbatch command, you can also include them in the command call:
Then script.sh
can simplify down to:
When working with R, we run scripts or arbitrary code snippets with Rscript, but the sbatch
command still expects Rscript to be wrapped in script_for_r.sh
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=2G
#SBATCH --time=0-01:00
module load R
Rscript myRscript.R
So running sbatch
looks the same:
Here is a meaningful r script we can test out. Note that I have specified 20 cpu’s in both script_for_r.sh
and myRscript.R
library(foreach)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- ~foreach~(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
ptime
stime <- system.time({
r <- ~foreach~(icount(trials), .combine=cbind) %do% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
stime
stopCluster(cl)
cl <- makeCluster(20)
registerDoParallel(cl)
ptime <- system.time({
r <- ~foreach~(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
ptime
7.2.7.5.2 salloc
and srun
If you really need to run something interactively, such as a Matlab GUI, make sure you log in with -X
or -Y
and then call:
7.2.7.5.3 Many nodes
So far, we have used only 1 node. The memory and cpu’s are all in the same node, and R behaves in exactly the same way as on your machine, only with more cores. You are clearly limited to a speed up of around 20 times with a single node.
To go beyond the single node limit, you have a few options.
- Divide your job up
- If you can break your job up into many independent runs, then you can submit a job for each run, and collect all the different outputs after all the jobs have run.
- I have done this to test out the effects of changing combinations of parameters.
- Rslurm
- The Rslurm package will generate a set of scripts that submit a collection of jobs. Rslurm does the work of dividing up your job for you.
- Take care with this package, the default settings will allocate far too many jobs and overwhelm Getafix.
- MPI
- Message Passing Interface (MPI) is a library that shares information across nodes.
- MPI requires extra setup in your code, and you need to use
mpirun
orsrun
inside the sbatch script
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-00:10
mpirun echo "running with MPI"
srun echo "run with SRUN"
doMPI and Rmpi are packages for working with MPI in R.
TODO: I have not figured out MPI on the cluster yet, it didn’t “just work”
- future.batchTools
- The future.batchtools package lets you submit SLURM jobs from R. The benefit of using future.batchtools is that you write R code using future syntax, then specify a SLURM plan, and all futures at a level specified by the plan are sent off as jobs. The script waits until the job finishes.
- I recommend running a single node task with 1 cpu and a modest amount of ram, but a long running time, then all the heavy lifting is done by new jobs submitted by your script.
- Take care with this package, I haven’t checked the defaults, but it might request too many resources like Rslurm.
7.2.7.5.4 R with parameters
Being able to pass in parameters is useful for dividing your job up.
Some examples of sbatch scripts that use R parameters
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --mem-per-cpu=7G
#SBATCH --time=1-22:00
module load R
Rscript script.R $1 $2
$1
becomes the first argument, input1
both times, and $2
becomes
the second argument, input2
in the first run and input3
in the
second line.
Rendering an Rmd file with parameters:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --mem-per-cpu=7G
#SBATCH --time=1-22:00
module load R
module load pandoc #essential for knitr and rmarkdown
paramlist='list(nCores =24, gRange = 1:300 )'
Rscript -e 'knitr::opts_knit$set(progress = TRUE, verbose = TRUE); rmarkdown::render("/data/uqpdyer/demo.Rmd", params = '"${paramlist}"', output_format = "html_notebook", output_file = "outfile.html", output_dir = "/data/uqpdyer", envir = new.env())'
If you want to generate a collection of rendered Rmd files, you have to write up a script for each set of parameters.
The other way to generate a set of Rmd files with different parameters, in parallel, is to copy the file and change the parameters in each file. Then you can use job arrays by naming the files from 1 to n.
The simplest approach I can think of to run many different Rmd files is to have a shell script, that calls an R script, that renders an Rmd. Parameters passed to the shell script go all the way down to the Rmd file.
args = commandArgs(trailingOnly = TRUE)
knitr::opts_knit$set(progress = TRUE, verbose = TRUE)
paramlist=list(nCores =as.numeric(args[1]), gRange = as.numeric(args[2]):as.numeric(args[3]))
rmarkdown::render("/data/uqpdyer/demo.Rmd", params = paramlist, output_format = "html_notebook",
output_file = "outfile.html", output_dir = "/data/uqpdyer", envir = new.env())
7.2.8 Debugging Rmd files
Debugging regular R isn’t great, but debugging Rmd files non-interactively is even worse.
Here are some tricks:
7.2.8.2 Load in cache dir
The qwraps2 package on github has a function, lazyload_cache
, which will pull in the saved cache so you can interact with it in RStudio.
https://github.com/dewittpe/qwraps2/blob/master/R/lazyload_cache.R
Once you call lazyload_cache()
on the cache generated by knitr, your environment has all the variables seen up until a failure point.
7.3 General HPC Guidance
7.3.1 Getting Results back
The batch jobs are not interactive. You set them up to run and come back for the results. Usually you save a file to disk.
All printed output goes to a file slurm-<job number>.out
.
For example, in SLURM you can change the name with
7.3.2 Data.frames
If you have R objects you want to see, save them to disk.
- saveRDS R binary format, single object, you specify the name when you readRDS
- fwrite data.table csv writer, fast, and gives you a csv
- feather a new binary format backed by Hadley, fast, and compatible with python
- fst another new binary format, extremely fast and you can read in subsets of the data.frame
- Archivist puts everything into a database and gives a unique hash to every object. Good for reproducibility, you can’t mix up objects from different runs later on because the hash always changes
7.3.3 Plots
Always save plots to disk, unless you are using Rmd files, which will put the plot into the output html anyway.
Use ggsave()
for ggplots, and dev.png()
, your plotting code, dev.off()
for other kinds of R plots.
It is a good idea to set the width, height and dpi for all your plots.
7.3.4 Playing nice
The cluster is a shared resource.
Try to be as precise as possible when allocating resources, so others can use what you don’t need.
Be careful with any package that submits jobs for you (Rslurm
,future.batchtools
). Check the jobs are requesting a reasonable set of resources.
7.3.5 Job Submission Tips
If you have trouble getting access to the resources you want, consider if you can split up your jobs into smaller parts. The queue system (PBSpro and SLURM) will normally favour smaller, shorter jobs. If you request a single job which needs, for example, 24 cores, then you will have a long (days) delay to start, because the HPC needs to find 24 cores on the same node, which may require existing jobs to finish. While your job is queued, other smaller jobs will jump in front of you and get time. If the cores don’t need to talk to each other, consider submitting the job as an array job, with 24 individual runs of 1 core each.
7.3.6 What HPC should I use?
The MME lab has users across all HPCs (Awoonga-Phil, Flashlite-Isaac, Tinarro-Patrick, Getafix-Patrick). Our staff/students will continue to use the system which works best for our individual needs.
Our experience is that getting time on Getafix is the easiest - apart from the end of session when a lot of undergraduates are completing assignments. However, Getafix is a bit more of a stand-alone cluster, which is run by SMP and does not have fast write access to the UQ Research Data Manager (RDM). The RCC HPC’s all have fast write access to the RDM.
As it is a smaller community of users, the staff in charge of Getafix seem to be more flexible in terms of installing new modules if you request. Because the RCC is responsible for the HPC support for the rest of UQ, they need more rigorous protocols to install new modules but they will do so if asked.
7.3.7 Parallel coding in R
7.3.7.1 Add a section on parallelising in SLURM/PBSpro
Include something on Epilogue and Prologue And pro/cons of parallelising in/outside of R
7.3.7.2 parlapply, par*apply etc.
The parallel package gives you par* versions of the apply family of functions.
Easy to drop in if you usually use the apply family from base R, you replace any apply
, lapply
, sapply
with parapply
, parlapply
, parsapply
.
7.3.7.3 foreach and %dopar%
The foreach
package has a few uses.
First, it changes how you use for
loops in R. Base R for
loops usually work by changing a variable defined outside the loop. foreach
is more like an apply
, where a result is returned, so foreach
works without needing to use side effects.
Second, because foreach
can work without side effects, it can run the loop in parallel easily. There are a number of backends you can use, including doParallel
, doMPI
, doFuture
. You can quickly change how parallel works by replacing %do%
with %dopar%
and changing the registerDo* function at the start of your script.
If you already like using foreach
over the apply family, moving to parallel is easy. The main downside is that foreach
is not particularly efficient with resources. If you are mostly using the apply family, switching to foreach
is much harder.
7.3.7.4 future
The future
package is built around the idea of a promise. The promise is “Eventually the variable will have a result assigned to it, but not straight away”. You replace <-
with %<-%
and the line of code becomes a promise, rather than something to do right away.
Generally, you don’t need to change your code much, just add %
around your <-
at strategic points, but you can gain a lot of flexibility, and there are many backends.
future.apply
gives you parallel versions of apply
. doFuture
mixes future
and foreach
.
future.batchtools
spawns jobs on the cluster that return the results into your session. future.callr
does local parallel calculations in separate R processes. future
alone will do local multiprocessor calculations and use multiple nodes.
7.3.7.5 General Coding habits for going parallel
- Put things into lists
- It is easy to loop over lists, and you can quickly parallelise the processing of a list.
- Don’t do graphics in parallel
- Generally speaking, you can specify ggplot objects in parallel, but if you try to print or save them in parallel, R crashes.
- You probably have to work very carefully to open multiple
dev.png()
for plotting base R plots.
- Make functions small
- If you repeat something frequently, put it into a function. You can then put functions into packages. Try to keep functions small, then they are easier to use in parallel. Generally don’t try to work in parallel within a function.
- If you want a function to use parallel processing, I strongly recommend you use either
future
orforeach
with%dopar%
. By default, they both drop back to sequential processing, but the user can set up parallel processing before calling the function to get speed improvements.
- Look for loops, but don’t use R base
for
loops
- Base R
for
loops cannot be used in parallel, because they rely on side effects.