Install Ryax on Kubernetes

We assume that you are comfortable with Kubernetes. To keep this guide short, we leave out details on the Kubernetes commands.

Requirements

All you need to install Ryax is a Kubernetes cluster and Docker installed on your machine. You can get a managed Kubernetes instance from any Cloud provider. For a local development installation please refers the Getting Started Guide.

Supported Kubernetes versions:

  • kubernetes > 1.19

Hardware:

  • At least 2 CPU core

  • 4GB or memory

  • 40GB of disk available

Note that depending on the Actions that you run on your cluster you might need more resources.

Preparatory Steps

  • Make sure your configuration point to the intended cluster: kubectl config current-context.

  • Your Kubernetes cluster dedicated to Ryax: we offer no guarantee that Ryax runs smoothly alongside other applications.

  • Make sure you have complete admin access to the cluster. Try to run kubectl auth can-i create ns or kubectl auth can-i create pc, for instance.

    $ kubectl auth can-i create ns
    Warning: resource 'namespaces' is not namespace scoped
    yes
    
  • Have access to a DNS server where you can add a new A or CNAME entry for your cluster.

Configure your Installation

Installing Ryax is analogous to installing a Helm chart. To begin we will start with a default configuration, and make a few tweaks so that everything is compatible with your Kubernetes provider. Be assured however that you will be able to fine-tune your installation later on.

Warning

Special warning for EKS (AWS Elastic Kubernetes Service)

Ryax requires persistent storage and by default, EKS does not provide any storage driver. Please, install the EBS CSI plugin with:

# Get this from `eksctl get clusters`
cluster_name=<My cluster name>

eksctl utils associate-iam-oidc-provider --cluster=$cluster_name --approve

eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster $cluster_name \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --role-only \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve

eksctl create addon --name aws-ebs-csi-driver --cluster $cluster_name --service-account-role-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/AmazonEKS_EBS_CSI_DriverRole --force

See the the official documentation for more details.

Also be aware that you cannot use Fargate because it does not support persistent storage

Initialize

First create a directory to organize the Ryax installation and initialize it with the default configuration:

mkdir ryax_install
cd ryax_install
docker run \
  -v $PWD:/data/volume \
  ryaxtech/ryax-adm:latest init --values volume/values.yaml

You are now in the ryax_install folder and the values.yaml containing the default config was created.

Note

All the following commands assume that you are in the ryax_install directory.

To explain the configuration fields, here is an example of simple configuration file for Ryax:

# The Ryax version.
# Check here to get the latest version: https://github.com/RyaxTech/ryax-engine/releases
version: 25.02.0

# Cluster DNS
clusterName: myclustername
domainName: example.com

# Log level for all Ryax services
logLevel: info

# Set the storage size for each stateful service
datastore:
  pvcSize: 10Gi
minio:
  pvcSize: 40Gi
registry:
  pvcSize: 20Gi

# Enable Prometheus + Grafana monitoring
monitoring:
  enabled: true

# Use HTTPS by default
tls:
  enabled: true

# Automate HTTPS with Let's Encrypt
certManager:
  enabled: false

# Depends on your Kubernetes instance. Leave it empty to use the default
storageClass: ""

worker:
  values:
    config:
      site:
        name: aws-kubernetes-cluster-1
        spec:
          nodePools:
          - name: small
            cpu: 2
            memory: 4G
            selector:
              eks.amazonaws.com/nodegroup: default

The Ryax installation is based on Helm charts, one for each service with a helmfile to define the whole cluster configuration.

To customize your installation. You can set any configuration field using the values keyword. A detailed description of all the values can be found in ryax-adm/helm-charts/values.yaml.

Settings

Set the version field with the Ryax version, for example: 23.10.0. The latest stable version can be found in the releases page.

The clusterName and domainName defines the name you give to your cluster, which is used in various places. One of those places is the URL of your cluster that will be <clusterName>.<domainName>, therefore it has to be consistent with your DNS.

If you do not intend to configure a DNS cluster, just leave this to the default value and disable the certManager, and in this case be aware you will access Ryax through the IP address directly and https certificate will be self-signed.

Warning

Depending on your Kubernetes cluster setup, you might have issue with Cert Manager which is use to get a valid HTTPS certificate. See the Cert Manager compatibility documentation for more details.

If you want to deal with the certificate yourself, you can disable it with:

certManager:
  enabled: false

An important configuration is the storageClass. If not set, Ryax will use the default one provided by the Kubernetes cluster for all services. But, the volumes are used to store the internal database (datastore), object store for workflows IO (filestore), and a container registry for the Ryax Actions containers (registry) which all affect your Ryax instance performance, so it is recommended to have SSD backed storage for all services to avoid delays state persistence, deployments, and runs. For more fine grained settings you can set each storage class independently with the storageClass inside each service. Regarding the volume size, we recommend that you start small, you can extend them later on with most Storage providers. The default values give comfortable volume sizes to start working on the platform.

Starting from Ryax version “24.10.0” and with the support of multi-sites on Ryax; the configuration needs to provide at least one worker reflecting the primary Kubernetes cluster related to Ryax.

In order to configure your Worker, you will need to select one or more node pools (set of homogeneous nodes) and give to the Worker some information about the nodes.

Note

Why we use node pools? Because it allows Ryax to leverage the Kubernetes node autoscaling with scale to zero !

Concerning the previously defined worker configuration, we provide a simple example configuration for AWS K8S managed cluster Let’s explain each field:

  • site.name: the name of the site that identifies the site in Ryax

  • site.spec.nodePools: the node pools definitions (a node pool is a set of homogeneous node. Each resource value is given by node).

    • name: name of the node pool.

    • cpu: amount of allocatable cpu core per node.

    • memory: amount of allocatable memory in bytes per node.

    • selector: node selector type within Kubernetes to precise which nodes will take part in the node pool.

These fields might change depending on the cloud provider. Below an example of configuration for Azure.

          kubernetes.azure.com/agentpool: default

All node pool information can be obtained using a simple:

kubectl describe nodes

To obtain resources values, look for the Allocatable fields. Regarding the selector, you should find the label(s) that uniquely refers to your node pool.

For more details about the Worker configuration please see the Worker reference documentation

Note

For Multi-Site Installation see Worker Installation Documentation

Install Ryax

First, be sure that your Kubernetes context is set properly. Make sure that either your KUBECONFIG variable is set and point to you cluster, or that the ~/.kube/config file contains you cluster configuration. See Preparatory steps to check you cluster access.

Warning

Depending on the Cloud provider you are using you might have to mount its configuration inside the container. For the following providers add the associated option:

  • Microsoft Azure: -v $HOME/.azure:/root/.azure

  • Google Cloud: -v $HOME/.config/gcloud:/root/.config/gcloud

  • AWS: -v $HOME/.aws:/root/.aws

Once you have customized your configuration you can install Ryax on your cluster (don’t forget to add extra option, see previous warning):

docker run \
  -v $PWD:/data/volume \
  -v $HOME/.kube/config:/data/kubeconfig.yml \
  ryaxtech/ryax-adm:latest apply --values volume/values.yaml --suppress-diff

Note

Optionally you can populate your cluster with some first action to use in your workflows (don’t forget to add extra option, see previous warning):

docker run \
  -v $PWD:/data/volume \
  -v $HOME/.kube/config:/data/kubeconfig.yml \
  --entrypoint=helm \
  ryaxtech/ryax-adm:latest \
  upgrade --install ryax-init ./helm-charts/ryax-init -n ryaxns

If the installation fails, check the logs, check your configuration and try again. If you are lost, or have any questions, please join our Discord server. We will be happy to help!

Configure DNS

The last step is configuring your DNS so that you can connect to your cluster. The address you should register is <clusterName>.<domainName>.

To retrieve the external IP of your cluster, run this one-liner

kubectl -n kube-system get svc traefik -o jsonpath='{.status.loadBalancer.ingress[].ip}'
# OR dpending on your provider
kubectl -n kube-system get svc traefik -o jsonpath='{.status.loadBalancer.ingress[].hostname}'

Or simply look at the response of kubectl -n kube-system get svc traefik, under “External IP”.

Depending on your Cloud provider you will have an IP address which requires a A entry, or a DNS (AWS) that requires you to create a CNAME entry.

Now create a DNS entry for the cluster and another for every subdomain using a star entry:

  • <clusterName>.<domainName>

  • *.<clusterName>.<domainName>

Once your entries are created, and only if tls is enabled, you will have to wait for Let’s Encrypt to provide you a valid certificate. You can check with:

kubectl get certificates -n ryaxns

The state should be READY: true.

Access to your cluster

Now you can access to you cluster with https://<clusterName>.<domainName> on your web browser.

Default credentials are user1/pass1

Warning

Change this password and user as soon as you’re logged in!

Cluster Update

The Ryax configuration is declarative, so in order to update your cluster you just have to change the configuration and apply it.

Note

You need to configure your Kubernetes cluster access and to set the Cloud provider specific otions, see installation process for more details.

The Ryax configuration is stored as a secret inside your cluster after each successful apply. You can get the actual cluster configuration from the cluster itself with:

docker run \
    -v $PWD:/data/volume \
    -v $HOME/.kube/config:/data/kubeconfig.yml \
    ryaxtech/ryax-adm:latest init --from-cluster --values volume/ryax_values.yaml

Warning

Before any updates, do a backup <./create-backups.html> and have a look at the changelog to see if there is any extra step needed.

Now you can simply change the version field in the configuration before applying the configuration like in the installation steps described above.

Ryax IntelliScale

Ryax IntelliScale is a Resource Management optimization technique (Vertical Pod Autoscaling) that performs an optimal sizing of allocated resources within nodes based on previous executions. It tracks the usage of CPUs, RAM, GPUs and GPU VRAM while recommending and adjusting follow-up executions based on the real usage of resources. You can find more details along with configuration info in Ryax IntelliScale

In particular when used for GPUs it performs dynamic GPU fractioning by leveraging NVIDIA MIG mechanism (available on specific new NVIDIA architectures)

Enable MIG

MIG, or Multi-Instance GPU, is a technology developed by NVIDIA that allows a single GPU to be partitioned into multiple instances. Each instance operates with its own dedicated resources, enabling various workloads to run simultaneously on a single GPU, which optimizes utilization and maximizes data center investment. For AI applications, MIG can be particularly beneficial as it allows for the efficient distribution of resources, ensuring that each task has the necessary computational power with a certain isolation from other processes running on the same GPU.

To enable the usage of MIG to be considered in the context of IntelliScale GPU tracking and adapted recommendations you need to setup the different supported MIG node pools within the configuration of the worker. IntelliScale functions at the level of each Kubernetes cluster so it needs to be configured for each worker.

An example for a configuration on Scaleway is given below:

# The following details should be created under worker nodepools 
          nodePools:
          - cpu: 23
            gpu: 7
            gpu_mode: mig-1g.10gb
            memory: 240G
            name: gpu-pool-mig-1g-10gb
            selector:
              k8s.scaleway.com/pool-name: gpu-pool-mig-1g-10gb
          - cpu: 23
            gpu: 2
            gpu_mode: mig-3g.40gb
            memory: 240G
            name: gpu-pool-mig-3g-40gb
            selector:
              k8s.scaleway.com/pool-name: gpu-pool-mig-3g-40gb
          - cpu: 23
            gpu: 1
            gpu_mode: mig-7g.80gb
            memory: 240G
            name: gpu-pool-mig-7g-80gb
            selector:
              k8s.scaleway.com/pool-name: gpu-pool-mig-7g-80gb

In the above example we configured 3 node pools each one representing a different MIG configuration. Ryax IntelliScale will track the usage of GPUs for each execution and will recommend the most adapted node-pool for the follow-up runs. Let’s explain each field:

  • cpu: the number of allocatable cpus of each node of the node pool

  • gpu: the number of allocatable GPU instances for each node of the node pool

  • gpu_mode: the MIG mode that this node pool is configured.
    - The value format should strictly follow the pattern “mig-xg.ygb” (xg.ygb is the standard MIG slice format with x MIG GIs and y GB GPU memory) if you want to enable MIG on this node pool. Instead if you want nodes with entire GPU (which is not under control of IntelliScale), the value should be a single word “full”.
    - Currently we suggest to use only mig-1g.10gb, mig-3g.40gb or mig-7g.80gb modes since there are less unutilized resources

  • memory: amount of allocatable RAM for each node of the node pool

  • name: name of the node pool

    • selector: node pool selector within Kubernetes.
      - To be handled by IntelliScale, a node pool with MIG mode GPUs should own a node selector with its value following strictly the format “gpu-pool-mig-xg-ygb”, where xg-ygb is the standard MIG slice format “xg.ygb” replacing ‘.’ by ‘-‘.

What is interesting to understand is that ideally you should set autoscaling activated from 0 to n

For more details regarding MIG, please refer to Nvidia’s User Guide. This is also another concrete example of how MIG is applied to industrial workflows.

Use local registry only

Ryax uses an internal registry to store actions’ images. To allow other kubernetes sites to join you are required to associate a valid domain name for ryax by setting domainName and clusterName. Then, you need to configure domain name resolution for both *.clusterName.domainName and clusterName.domainName pointing to the correct kubernetes cluster public IP address.

If your cluster is inaccessible from outside your private network you need to use a nodeport to connect to the registry. This will allow actions’ pods to deploy, however you will not be able to connect external kubernetes sites. To accomplish that just disable tls on the ryax_values.yaml to disable registry credentials and make the internal registry available from a nodePort:

# Enable ryax to work on local site only, no external access to registry
# Notice that disabling tls you cannot add sites outside your local network
tls:
  enabled: false

This will start one pod per node named ryax-registry-cert-setup-xxxxx that configures certificates to access the internal registry through 127.0.0.1:30012. The pod images for actions in namespace ryaxns-execs will pull images through that nodeport.

Troubleshooting

Cannot upgrade, ryax-adm gives rabbitmq password error

When trying to change configuration using ryax-adm apply you might experience rabbitmq errors like below.

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: execution error at (rabbitmq/templates/secrets.yaml:4:17):
  PASSWORDS ERROR: You must provide your current passwords when upgrading the release.
                   Note that even after reinstallation, old credentials may be needed as they may be kept in persistent volume claims.
                   Further information can be obtained at https://docs.bitnami.com/general/how-to/troubleshoot-helm-chart-issues/#credential-errors-while-upgrading-chart-releases
      'auth.password' must not be empty, please add '--set auth.password=$RABBITMQ_PASSWORD' to the command. To get the current value:
          export RABBITMQ_PASSWORD=$(kubectl get secret --namespace "ryaxns" ryax-broker-secret -o jsonpath="{.data.rabbitmq-password}" | base64 -d)
  Use --debug flag to render out invalid YAML

You can find the correct password with:

kubectl get secret --namespace "ryaxns" ryax-broker-secret -o jsonpath="{.data.rabbitmq-password}" | base64 -d

To fix this error add a section broker with the correct password like below (change secret with your password):

rabbitmq:
  values:
    auth:
      password: secret

All actions’ pods on ryaxns-execs are in imagePullBackOff

If you are getting imagePullBackOff for pods on ryaxns-execs. You are probably having trouble accessing the registry through the external domain name. Assure that your DNS is configured and that the ryax traefik service is using the correct ip or fully qualified hostname. You can check Services by typing:

kubectl get service -A  | grep -i LoadBalancer

Make sure that the ip/hostname associated to traefik LoadBalancer is correct. Make sure to add your dns entry with a wild card. For instance, if you configure clusterName as example and domainName as ryax.io, make sure that you have dns entries *.example.ryax.io and example.ryax.io pointing to the correct IP address. See also (Configure DNS)[#configure_dns].

If you do not want to configure external access to your cluster you won’t be able to connect external kubernetes workers, but you can always have a local worker. In this case, to configure the internal registry refer to Use local registry only.