Troubleshooting
Errors when using ArgoCD to deploy
If you are using ArgoCD to manage the operator, you will encounter the issue which complains the CRDs too long. A similar issue can be found here: issue. The recommended solution is to split the operator into two Argo apps, such as:
The first app is just for installing the CRDs with
Replace=true
directly, snippet:
The second app that installs the Helm chart with
skipCrds=true
(new feature in Argo CD 2.3.0), snippet:
Permission issues during Ray pods startup
In some cases, permission issues can arise with one of the mounted volumes (either /tmp/ray-data
or /tmp/ray-workflows
causing the Ray cluster pod(s) to not startup correctly. In this case, enabling the InitContainer
within the Ray head section of the Helm chart might resolve the issues (enabled by default).
If this does not resolve your issues, and still get an error on the startup of the Ray cluster pod(s), we recommend adjusting the securityContext
for every pod. The following changes will need to be made:
If there are any workers enabled for the cluster, the same needs to be added in the worker configurations.
Last updated