Deploy Ray using Helm
To distribute our computational requirements and run heavy jobs like synthesizing your data, we will need to deploy a Ray cluster using Helm for our API to connect to. The chart can be found in the repository here or can be supplied by a repo URL to be used in Helm directly. Please contact the Syntho team for this repo URL.
This part of the documentation will assume access to the folder helm/ray
in the master branch of the aforementioned GitHub repository.
A note for OpenShift users: Ray requires us to create many threads within a pod, especially when scaling the Ray cluster. By default, OpenShift limits how many processes a pod can spawn, due to its usage of CRI-O as its Container Runtime Interface (CRI). We recommend updating this limit: see the section Additional changes for OpenShift/CRI-O
Setting the image
In the values.yaml
file in helm/ray
, set the following fields to ensure the usage of the correct Docker image:
The image tag will be provided by the Syntho Team. In some cases, the latest tag can be used, but we recommend setting a specific tag.
Next to setting the correct Docker image, define the Kubernetes Secret
that is created under imagePullSecrets
:
This value is set to syntho-cr-secret
by default.
License key - Ray
The license key can be set under SynthoLicense
in the values.yaml
file. An example of this would be:
Please use the license key provided by Syntho.
Cluster name
The default cluster name is set to ray-cluster
. In case this needs to be adjusted, you can do so by changing clustername
:
Workers and nodes
First of all, the Syntho Team will have some recommendations on what the exact size of the cluster should be given your data requirements. If you haven't received any information about this, please contact the Syntho team first to discuss the optimal setup. The rest of this section will give an example configuration to show what that looks like in the Helm chart.
Depending on the size and amount of nodes of the cluster, adjust the number of workers that Ray has available for tasks. Ray will need at least one head instance. To increase performance, we can create additional worker groups as well. Under head
we can set the resources for the head node. This head node will mostly be used for administrative tasks in Ray and the worker nodes will be picking up most of the tasks for the Syntho Application.
For a production environment, we recommend a pool of workers next to the head node. The Syntho Team can indicate what resources should be assigned to the head node and worker nodes. Here is an example configuration of a cluster with a head node and one node pool, of 1 machine, with 16 CPUs and 64GB of RAM:
If autoscaling is enabled in Kubernetes, new nodes will be created once the Ray requirements are higher than the available resources. Please discuss together with the Syntho Team which situation would fit your data requirements.
For development or experimental environments, most of the time a less advanced setup is needed. In this case, we recommend only setting up a head node type to begin with and no workers or additional autoscaling setup. The Syntho Team will again advise on the size of this node, given the data requirements. An example configuration using a node with 16 CPUs and 64 GB of RAM would be:
Additionally, nodeSelector
, tolerations
and affinity
can be defined for each type of node, to have some control over where the pods/nodes exactly get assigned. securityContext
and annotiations
can also be set for each type of worker/head node.
Shared storage of Ray workers
We require an additional Persistent Volume for the Ray workers to share some metadata about the current tasks running. This is included in the Helm chart and has the Persistent Volume type ReadWriteMany
. In the section storage
you can adjust the storageClassName to use for this. Please make sure that you're using a storageClass
that supports type ReadWriteMany
.
Volume mounts
Additional changes for OpenShift/CRI-O
Certain orchestrators or setups use CRI-O as the container runtime interface (CRI). Openshift 4.x currently has CRI-O set a limit of 1024 processes by default. When scaling, this limit can easily be reached using Ray. We recommend setting this limit to around 8096 processes.
The Openshift documentation describes the steps to increase this limit here. The following link has more information about the settings that can be used in CRI-O.
Deploy
Once the values have been set correctly in values.yaml
under helm/ray
, we can deploy the application to the cluster using the following command:
Once deployed, we can find the service name in Kubernetes for the Ray application. In the case of using the name ray-cluster
as is the case in the command above, the service name (and hostname to use in the variable ray_address
for the Core API values section) is ray-cluster-ray-head
.
Lastly, we can check the ray-operator pod and subsequent Ray head or Ray worker pods. Running kubectl logs deployment/ray-operator -n syntho
will show us the logs of the operator.
Last updated