Install Ryax on Kubernetes¶
We assume that you are comfortable with Kubernetes. To keep this guide short, we leave out details on the Kubernetes commands.
Requirements¶
All you need to install Ryax is a Kubernetes cluster and Docker installed on your machine. You can get a managed Kubernetes instance from any Cloud provider. For a local development installation please refers the Getting Started Guide.
Supported Kubernetes versions:
kubernetes > 1.19
Hardware:
At least 2 CPU core
4GB or memory
40GB of disk available
Note that depending on the Actions that you run on your cluster you might need more resources.
Preparatory Steps¶
Make sure your configuration point to the intended cluster:
kubectl config current-context
.Your Kubernetes cluster dedicated to Ryax: we offer no guarantee that Ryax runs smoothly alongside other applications.
Make sure you have complete admin access to the cluster. Try to run
kubectl auth can-i create ns
orkubectl auth can-i create pc
, for instance.$ kubectl auth can-i create ns Warning: resource 'namespaces' is not namespace scoped yes
Have access to a DNS server where you can add a new
A
orCNAME
entry for your cluster.
Configure your Installation¶
Installing Ryax is analogous to installing a Helm chart. To begin we will start with a default configuration, and make a few tweaks so that everything is compatible with your Kubernetes provider. Be assured however that you will be able to fine-tune your installation later on.
Warning
Special warning for EKS (AWS Elastic Kubernetes Service)
Ryax requires persistent storage and by default, EKS does not provide any storage driver. Please, install the EBS CSI plugin with:
# Get this from `eksctl get clusters`
cluster_name=<My cluster name>
eksctl utils associate-iam-oidc-provider --cluster=$cluster_name --approve
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster $cluster_name \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--role-only \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve
eksctl create addon --name aws-ebs-csi-driver --cluster $cluster_name --service-account-role-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/AmazonEKS_EBS_CSI_DriverRole --force
See the the official documentation for more details.
Also be aware that you cannot use Fargate because it does not support persistent storage
Initialize¶
First create a directory to organize the Ryax installation and initialize it with the default configuration:
mkdir ryax_install
cd ryax_install
docker run \
-v $PWD:/data/volume \
ryaxtech/ryax-adm:latest init --values volume/values.yaml
You are now in the ryax_install
folder and the values.yaml
containing the
default config was created.
Note
All the following commands assume that you are in the ryax_install directory.
To explain the configuration fields, here is an example of simple configuration file for Ryax:
# The Ryax version.
# Check here to get the latest version: https://github.com/RyaxTech/ryax-engine/releases
version: 25.02.0
# Cluster DNS
clusterName: myclustername
domainName: example.com
# Log level for all Ryax services
logLevel: info
# Set the storage size for each stateful service
datastore:
pvcSize: 10Gi
minio:
pvcSize: 40Gi
registry:
pvcSize: 20Gi
# Enable Prometheus + Grafana monitoring
monitoring:
enabled: true
# Use HTTPS by default
tls:
enabled: true
# Automate HTTPS with Let's Encrypt
certManager:
enabled: false
# Depends on your Kubernetes instance. Leave it empty to use the default
storageClass: ""
worker:
values:
config:
site:
name: aws-kubernetes-cluster-1
spec:
nodePools:
- name: small
cpu: 2
memory: 4G
selector:
eks.amazonaws.com/nodegroup: default
The Ryax installation is based on Helm charts, one for each service with a helmfile
to define the whole cluster configuration.
To customize your installation. You can set any configuration field using the values
keyword. A detailed description of all the values can be found in ryax-adm/helm-charts/values.yaml.
Settings¶
Set the version
field with the Ryax version, for example: 23.10.0
. The latest stable version can be found in the releases page.
The clusterName
and domainName
defines the name you give to your cluster, which is used in various places. One of those places is the URL of your cluster that will be <clusterName>.<domainName>, therefore it has to be consistent with your DNS.
If you do not intend to configure a DNS cluster, just leave this to the default value and disable the certManager, and in this case be aware you will access Ryax through the IP address directly and https certificate will be self-signed.
Warning
Depending on your Kubernetes cluster setup, you might have issue with Cert Manager which is use to get a valid HTTPS certificate. See the Cert Manager compatibility documentation for more details.
If you want to deal with the certificate yourself, you can disable it with:
certManager:
enabled: false
An important configuration is the storageClass
. If not set, Ryax will use the
default one provided by the Kubernetes cluster for all services. But, the
volumes are used to store the internal database (datastore
), object store for
workflows IO (filestore
), and a container registry for the Ryax Actions
containers (registry
) which all affect your Ryax instance performance, so it
is recommended to have SSD backed storage for all services to avoid delays
state persistence, deployments, and runs.
For more fine grained settings you can set each storage class independently
with the storageClass
inside each service.
Regarding the volume size, we recommend that you start small, you can extend them later
on with most Storage providers. The default values give comfortable volume sizes to start working on the platform.
Starting from Ryax version “24.10.0” and with the support of multi-sites on Ryax;
the configuration needs to provide at least one worker
reflecting the primary Kubernetes cluster related to Ryax.
In order to configure your Worker, you will need to select one or more node pools (set of homogeneous nodes) and give to the Worker some information about the nodes.
Note
Why we use node pools? Because it allows Ryax to leverage the Kubernetes node autoscaling with scale to zero !
Concerning the previously defined worker configuration, we provide a simple example configuration for AWS K8S managed cluster Let’s explain each field:
site.name: the name of the site that identifies the site in Ryax
site.spec.nodePools: the node pools definitions (a node pool is a set of homogeneous node. Each resource value is given by node).
name: name of the node pool.
cpu: amount of allocatable cpu core per node.
memory: amount of allocatable memory in bytes per node.
selector: node selector type within Kubernetes to precise which nodes will take part in the node pool.
These fields might change depending on the cloud provider. Below an example of configuration for Azure.
kubernetes.azure.com/agentpool: default
All node pool information can be obtained using a simple:
kubectl describe nodes
To obtain resources values, look for the Allocatable fields. Regarding the selector, you should find the label(s) that uniquely refers to your node pool.
For more details about the Worker configuration please see the Worker reference documentation
Note
For Multi-Site Installation see Worker Installation Documentation
Install Ryax¶
First, be sure that your Kubernetes context is set properly. Make sure that either your KUBECONFIG
variable is set and point to you cluster, or that the ~/.kube/config
file contains you cluster configuration. See Preparatory steps to check you cluster access.
Warning
Depending on the Cloud provider you are using you might have to mount its configuration inside the container. For the following providers add the associated option:
Microsoft Azure:
-v $HOME/.azure:/root/.azure
Google Cloud:
-v $HOME/.config/gcloud:/root/.config/gcloud
AWS:
-v $HOME/.aws:/root/.aws
Once you have customized your configuration you can install Ryax on your cluster (don’t forget to add extra option, see previous warning):
docker run \
-v $PWD:/data/volume \
-v $HOME/.kube/config:/data/kubeconfig.yml \
ryaxtech/ryax-adm:latest apply --values volume/values.yaml --suppress-diff
Note
Optionally you can populate your cluster with some first action to use in your workflows (don’t forget to add extra option, see previous warning):
docker run \
-v $PWD:/data/volume \
-v $HOME/.kube/config:/data/kubeconfig.yml \
--entrypoint=helm \
ryaxtech/ryax-adm:latest \
upgrade --install ryax-init ./helm-charts/ryax-init -n ryaxns
If the installation fails, check the logs, check your configuration and try again. If you are lost, or have any questions, please join our Discord server. We will be happy to help!
Configure DNS¶
The last step is configuring your DNS so that you can connect to your cluster. The address you should register is <clusterName>.<domainName>.
To retrieve the external IP of your cluster, run this one-liner
kubectl -n kube-system get svc traefik -o jsonpath='{.status.loadBalancer.ingress[].ip}'
# OR dpending on your provider
kubectl -n kube-system get svc traefik -o jsonpath='{.status.loadBalancer.ingress[].hostname}'
Or simply look at the response of kubectl -n kube-system get svc traefik
, under “External IP”.
Depending on your Cloud provider you will have an IP address which requires a A
entry, or a DNS (AWS) that requires you to create a CNAME
entry.
Now create a DNS entry for the cluster and another for every subdomain using a star entry:
<clusterName>.<domainName>
*.<clusterName>.<domainName>
Once your entries are created, and only if tls is enabled, you will have to wait for Let’s Encrypt to provide you a valid certificate. You can check with:
kubectl get certificates -n ryaxns
The state should be READY: true
.
Access to your cluster¶
Now you can access to you cluster with https://<clusterName>.<domainName>
on your web browser.
Default credentials are user1/pass1
Warning
Change this password and user as soon as you’re logged in!
Cluster Update¶
The Ryax configuration is declarative, so in order to update your cluster you just have to change the configuration and apply it.
Note
You need to configure your Kubernetes cluster access and to set the Cloud provider specific otions, see installation process for more details.
The Ryax configuration is stored as a secret inside your cluster after each successful apply. You can get the actual cluster configuration from the cluster itself with:
docker run \
-v $PWD:/data/volume \
-v $HOME/.kube/config:/data/kubeconfig.yml \
ryaxtech/ryax-adm:latest init --from-cluster --values volume/ryax_values.yaml
Warning
Before any updates, do a backup <./create-backups.html> and have a look at the changelog to see if there is any extra step needed.
Now you can simply change the version
field in the configuration before applying the configuration like in the installation steps described above.
Ryax IntelliScale¶
Ryax IntelliScale is a Resource Management optimization technique (Vertical Pod Autoscaling) that performs an optimal sizing of allocated resources within nodes based on previous executions. It tracks the usage of CPUs, RAM, GPUs and GPU VRAM while recommending and adjusting follow-up executions based on the real usage of resources. You can find more details along with configuration info in Ryax IntelliScale
In particular when used for GPUs it performs dynamic GPU fractioning by leveraging NVIDIA MIG mechanism (available on specific new NVIDIA architectures)
Enable MIG¶
MIG, or Multi-Instance GPU, is a technology developed by NVIDIA that allows a single GPU to be partitioned into multiple instances. Each instance operates with its own dedicated resources, enabling various workloads to run simultaneously on a single GPU, which optimizes utilization and maximizes data center investment. For AI applications, MIG can be particularly beneficial as it allows for the efficient distribution of resources, ensuring that each task has the necessary computational power with a certain isolation from other processes running on the same GPU.
To enable the usage of MIG to be considered in the context of IntelliScale GPU tracking and adapted recommendations you need to setup the different supported MIG node pools within the configuration of the worker. IntelliScale functions at the level of each Kubernetes cluster so it needs to be configured for each worker.
An example for a configuration on Scaleway is given below:
# The following details should be created under worker nodepools
nodePools:
- cpu: 23
gpu: 7
gpu_mode: mig-1g.10gb
memory: 240G
name: gpu-pool-mig-1g-10gb
selector:
k8s.scaleway.com/pool-name: gpu-pool-mig-1g-10gb
- cpu: 23
gpu: 2
gpu_mode: mig-3g.40gb
memory: 240G
name: gpu-pool-mig-3g-40gb
selector:
k8s.scaleway.com/pool-name: gpu-pool-mig-3g-40gb
- cpu: 23
gpu: 1
gpu_mode: mig-7g.80gb
memory: 240G
name: gpu-pool-mig-7g-80gb
selector:
k8s.scaleway.com/pool-name: gpu-pool-mig-7g-80gb
In the above example we configured 3 node pools each one representing a different MIG configuration. Ryax IntelliScale will track the usage of GPUs for each execution and will recommend the most adapted node-pool for the follow-up runs. Let’s explain each field:
cpu: the number of allocatable cpus of each node of the node pool
gpu: the number of allocatable GPU instances for each node of the node pool
gpu_mode: the MIG mode that this node pool is configured.
- The value format should strictly follow the pattern “mig-xg.ygb” (xg.ygb is the standard MIG slice format with x MIG GIs and y GB GPU memory) if you want to enable MIG on this node pool. Instead if you want nodes with entire GPU (which is not under control of IntelliScale), the value should be a single word “full”.
- Currently we suggest to use only mig-1g.10gb, mig-3g.40gb or mig-7g.80gb modes since there are less unutilized resourcesmemory: amount of allocatable RAM for each node of the node pool
name: name of the node pool
selector: node pool selector within Kubernetes.
- To be handled by IntelliScale, a node pool with MIG mode GPUs should own a node selector with its value following strictly the format “gpu-pool-mig-xg-ygb”, where xg-ygb is the standard MIG slice format “xg.ygb” replacing ‘.’ by ‘-‘.
What is interesting to understand is that ideally you should set autoscaling activated from 0 to n
For more details regarding MIG, please refer to Nvidia’s User Guide. This is also another concrete example of how MIG is applied to industrial workflows.
Use local registry only¶
Ryax uses an internal registry to store actions’ images. To allow other kubernetes
sites to join you are required to associate a valid domain name
for ryax by setting domainName and clusterName. Then, you need to configure
domain name resolution for both *.clusterName.domainName
and clusterName.domainName
pointing to the correct kubernetes cluster public IP address.
If your cluster is inaccessible from outside your private network you need
to use a nodeport to connect to the registry. This will allow actions’ pods to deploy,
however you will not be able to connect external kubernetes sites.
To accomplish that just disable tls on the ryax_values.yaml
to disable
registry credentials and make the internal registry available from a nodePort:
# Enable ryax to work on local site only, no external access to registry
# Notice that disabling tls you cannot add sites outside your local network
tls:
enabled: false
This will start one pod per node named ryax-registry-cert-setup-xxxxx
that
configures certificates to access the internal registry through 127.0.0.1:30012
.
The pod images for actions in namespace ryaxns-execs
will pull images through that
nodeport.
Troubleshooting¶
Cannot upgrade, ryax-adm gives rabbitmq password error¶
When trying to change configuration
using ryax-adm apply
you might experience rabbitmq errors like
below.
COMBINED OUTPUT:
Error: Failed to render chart: exit status 1: Error: execution error at (rabbitmq/templates/secrets.yaml:4:17):
PASSWORDS ERROR: You must provide your current passwords when upgrading the release.
Note that even after reinstallation, old credentials may be needed as they may be kept in persistent volume claims.
Further information can be obtained at https://docs.bitnami.com/general/how-to/troubleshoot-helm-chart-issues/#credential-errors-while-upgrading-chart-releases
'auth.password' must not be empty, please add '--set auth.password=$RABBITMQ_PASSWORD' to the command. To get the current value:
export RABBITMQ_PASSWORD=$(kubectl get secret --namespace "ryaxns" ryax-broker-secret -o jsonpath="{.data.rabbitmq-password}" | base64 -d)
Use --debug flag to render out invalid YAML
You can find the correct password with:
kubectl get secret --namespace "ryaxns" ryax-broker-secret -o jsonpath="{.data.rabbitmq-password}" | base64 -d
To fix this error add a section broker with the correct password like below (change secret with your password):
rabbitmq:
values:
auth:
password: secret
All actions’ pods on ryaxns-execs are in imagePullBackOff¶
If you are getting imagePullBackOff for pods on ryaxns-execs. You are probably having trouble accessing the registry through the external domain name. Assure that your DNS is configured and that the ryax traefik service is using the correct ip or fully qualified hostname. You can check Services by typing:
kubectl get service -A | grep -i LoadBalancer
Make sure that the ip/hostname associated to traefik LoadBalancer
is correct.
Make sure to add your dns entry with a wild card. For instance, if you configure
clusterName as example
and domainName as ryax.io
, make sure that you have
dns entries *.example.ryax.io
and example.ryax.io
pointing to the correct IP
address. See also (Configure DNS)[#configure_dns].
If you do not want to configure external access to your cluster you won’t be able to connect external kubernetes workers, but you can always have a local worker. In this case, to configure the internal registry refer to Use local registry only.