HPC Offloading¶
Ryax is able to offload part of a workflow to an High Performance Computing (HPC) cluster. It is especially useful when you need to automate pre and post processing for large simulation. It means that in the same workflow, some actions will run on Ryax Kubernetes cluster and some will run on an external HPC site through SSH.
Requirements¶
Ryax currently only support Slurm as cluster manager.
It requires an SSH access to the frontend node and a shared home directory across nodes to be able to manges Action IO and capture logs.
Ryax is leveraging Singularity (a.k.a Apptainer) to package and run containers on the HPC cluster. Thus, you will need Singularity to be installed on the cluster which can only be done by an administrator.
Python 3 must present on the frontend because it is required to wrap actions when using the custom_script
that run directly on the node and not in Singularity.
Ryax will create a .ryax_cache
folder in your home directory that will contain the container images, the IO, and the working directory of each Run. Make sure that you have enough space on your home (it really depends on your needs).
Be sure that the Runner configuration enables hpcOffloading. In your Ryax installation configuration, the values.yaml
must contain:
runner:
values:
hpcOffloading: true
To sum up the requirements for the HPC cluster are:
Slurm
SSH access to the frontend
Python3 installed on the frontend
Singularity on all nodes
Some room in your home folder
Simple Usage¶
To enable HPC offloading for a Ryax Action, you must add the hpc
add-on to the ryax_metadata.yaml
:
apiVersion: "ryax.tech/v2.0"
kind: Processor
spec:
# ...
addons:
hpc: {}
Then, you will be able to set the HPC offloading parameters when you instantiate your action in a workflow.
Your actions will run just like regular Ryax Actions but using a Singularity containers inside the HPC cluster. Slurm runs the containers in parallel following the parameters of the sbatch header (typically setting number of nodes and number of tasks). As usual, Ryax will provide inputs and retrieve outputs upon execution completion.
Warning
If the custom_script parameter is set the advanced mode is automatically used. Be sur that this parameter is Null to stay in this mode.
Advanced Usage for parallel job¶
If you want you action to run in parallel on multiple node, for example for an MPI application, you will need to define the custom_script
parameter. This script will be injected in the sbatch script and run directly on the fronted node. Action inputs will be available as environment variables and outputs must be exported as environment variable at the end of the script. Also, the Singularity image that was built and deployed for you by Ryax is available in the RYAX_ACTION_IMG
environment variable. The current working directory a temporary folder create for each run.
For example, if you define your action’s inputs/outputs like this in your ryax_metadata.yaml
file:
spec:
inputs:
- help: Number of process to run on
human_name: Number of process
name: nb_process
type: integer
outputs:
- help: Results directory of the tutorial
human_name: Result directory
name: results_dir
type: directory
In this example, the custom_script
can access nb_process
as bash variable and the result_dir
is exported at the end:
set -e # Avoid silent error
set -x # Debug
# Show inputs
echo Number of process: $nb_process
# Generate Hostfile
scontrol show hostnames > ./hostfile
mpirun --hostfile ./hostfile -np $nb_process singularity exec $RYAX_ACTION_IMG
# Export outputs
export results_dir=$PWD
echo Done!
To install MPI inside your Singularity image don’t forget to add dependencies in your ryax_metadata.yaml
file:
spec:
dependencies:
- openmpi
- openssh
The MPI library inside your container must be compatible with the one on the cluster. For more information on why we run MPI this way and the limitation of MPI with Singularity please have a look at the Singularity documentation.
HPC Actions repository¶
You can find examples of HPC enabled Ryax Actions in our public repository: https://gitlab.com/ryax-tech/workflows/hpc-actions.git
Contributions are welcomed!