Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Information about security considerations
The autolaunch feature empowers you to create HTTP links that automatically deploy an environment. This is an invaluable tool for initiating trainings effortlessly. However, exercise caution while using it as it could pose a security risk to the user. Consider disabling this feature if it doesn't suit your requirements or if security is a primary concern.
Onyxia is primarily designed to allocate resources such as a namespace and an S3 bucket to an individual user for work purposes. Additionally, it incorporates a feature that allows multiple users to share access to the same resources within a project. While this can be extremely beneficial for collaboration, be aware that it might be exploited by a malicious user within the group to leverage the privileges of another project member. Always monitor shared resources and maintain proper user access control to prevent such security breaches.
Willing to submit PRs on the Onyxia codebase?
Your Onyxia instance, today
TLDR. Here is how you can get an Onyxia instance running in a matter of seconds.
With this, you will obtain an instance operating in a degraded mode, which lacks features such as authentication, S3 explorer, secret management, etc. However, you will still have the capability to launch services from the catalog.
In this section, we will set up Onyxia from the ground up, along with all the associated technologies. This includes MinIO for S3, Keycloak for OIDC, and Vault for managing secrets.
First you'll need a Kubernetes cluster. If you have one already you can skip this section.
Hashicorp maintains great tutorials for terraforming Kubernetes clusters on AWS, GCP or Azure.
Pick one of the three and follow the guide.
You can stop after the configure kubectl section.
Ingress controller
Deploy an ingress controller on your cluster:
DNS
Let's assume you own the domain name my-domain.net, for the rest of the guide you should replace my-domain.net by a domain you actually own.
Now you need to get the external address of your cluster, run the command
and write down the External IP
assigned to the LoadBalancer
.
Depending on the cloud provider you are using it can be an IPv4, an IPv6 or a domain. On AWS for example, it will be a domain like xxx.elb.eu-west-1.amazonaws.com.
If you see <pending>
, wait a few seconds and try again.
Once you have the address, create the following DNS records:
If the address you got was an IPv4 (x.x.x.x
), create a A
record instead of a CNAME.
If the address you got was ans IPv6 (y:y:y:y:y:y:y:y
), create a AAAA
record.
https://onyxia.my-domain.net will be the URL for your instance of Onyxia. The URL of the services created by Onyxia are going to look like: https://<something>.lab.my-domain.net
You can customise "onyxia" and "lab" to your liking, for example you could chose datalab.my-domain.net and *.kub.my-domain.net.
SSL
In this section we will obtain a TLS certificate issued by LetsEncrypt using the certbot commend line tool then get our ingress controller to use it.
If you are already familiar with certbot
you're probably used to run it on a remote host via SSH. In this case you are expected to run it on your own machine, we'll use the DNS chalenge instead of the HTTP chalenge.
The obtained certificate needs to be renewed every three month.
To avoid the burden of having to remember to re-run the certbot
command periodically you can setup cert-manager and configure a DNS01 challenge provider on your cluster but that's out of scope for Onyxia.
You may need to delegate your DNS Servers to one of the supported DNS service provider.
Now we want to create a Kubernetes secret containing our newly obtained certificate:
Lastly, we want to tell our ingress controller to use this TLS certificate, to do so run:
This command will open your configured text editor, go to line 56
and add:
If you are on a Mac or Window computer you can install Docker desktop then enable Kubernetes.
Docker desktop isn't available on Linux, you can use Kind instead.
Port Forwarding
You'll need to forward the TCP ports 80 and 443 to your local machine. It's done from the administration panel of your domestic internet Box. If you're on a corporate network, no luck for you I'm afraid.
DNS
Let's assume you own the domain name my-domain.net, for the rest of the guide you should replace my-domain.net by a domain you actually own.
Get your internet box routable IP and create the following DNS records:
If you have DDNS domain you can create CNAME
instead example:
https://onyxia.my-domain.net will be the URL for your instance of Onyxia.
The URL of the services created by Onyxia are going to look like: https://xxx.lab.my-domain.net
You can customise "onyxia" and "lab" to your liking, for example you could chose datalab.my-domain.net and *.kub.my-domain.net.
SSL
In this section we will obtain a TLS certificate issued by LetsEncrypt using the certbot commend line tool.
The obtained certificate needs to be renewed every three month.
To avoid the burden of having to remember to re-run the certbot
command periodically you can setup cert-manager and configure a DNS01 challenge provider on your cluster but that's out of scope for Onyxia.
You may need to delegate your DNS Servers to one of the supported DNS service provider.
Now we want to create a Kubernetes secret containing our newly obtained certificate:
Ingress controller
We'll install ingress-nginx in our cluster but any other ingress controller will do.
In this section we assume that:
You have a Kubernetes cluster and kubectl
configured
onyxia.my-domain.net and *.lab.my-domain.net are pointing to your cluster's external address. my-domain.net being a domain that you own. You can customise "onyxia" and "lab" to your liking, for example you could chose datalab.my-domain.net and *.kub.my-domain.net.
You have an ingress controller configured with a default TLS certificate for *.lab.my-domain.net and onyxia.my-domain.net.
As of today the default service catalog will only work with ingress-nginx.
This will be addressed in the near future.
Through out this guide we make as if everything was instantaneous. In reality if you are testing on a small cluster you will need to wait several minutes after hitting helm install
for the services to be ready.
Use kubectl get pods
to see if your pods are up and ready.
You can now access https://onyxia.my-domain.net
and start services. Congratulations! 🥳
You have the ability to customize the user interface (UI) of Onyxia through the provision of specific environment variables to the UI. For details on the available options, please consult the 'UI Customization' section of this file.
If you are unsure about how to supply these variables, refer to the later section of this guide where we discuss how to provide the KEYCLOAK_* parameters. You'll then be able to add your UI-related parameters alongside them.
At the moment there is no authentication process, everyone can access our platform and and start services.
Let's setup Keycloak to enable users to create account and login to our Onyxia.
For deploying our Keycloak we use codecentric's helm chart.
You can now login to the administration console of https://auth.lab.my-domain.net and login using the credentials you have defined with KEYCLOAK_USER
and KEYCLOAK_PASSWORD
.
Create a realm called "datalab" (or something else), go to Realm settings
On the tab General
User Profile Enabled: On
On the tab login
User registration: On
Forgot password: On
Remember me: On
On the tab email, we give an example with **** AWS SES, if you don't have a SMTP server at hand you can skip this by going to Authentication (on the left panel) -> Tab Required Actions -> Uncheck "set as default action" Verify Email. Be aware that with email verification disable, anyone will be able to sign up to your service.
From: noreply@lab.my-domain.net
Host: email-smtp.us-east-2.amazonaws.com
Port: 465
Authentication: enabled
Username: **************
Password: ***************************************
When clicking "save" you'll be asked for a test email, you have to provide one that correspond to a pre-existing user or you will get a silent error and the credentials won't be saved.
On the tab Themes
Login theme: onyxia-web (you can also select the login theme on a per client basis)
Email theme: onyxia-web
On the tab Localization
Internationalization: Enabled
Supported locales: <Select the languages you wish to support>
Create a client called "onyxia"
Root URL: https://onyxia.my-domain.net/
Valid redirect URIs: https://onyxia.my-domain.net/*
Web origins: *
Login theme: onyxia-web
In Authentication (on the left panel) -> Tab Required Actions enable and set as default action Therms and Conditions.
Now you want to ensure that the username chosen by your users complies with Onyxia requirement (only alphanumerical characters) and define a list of email domain allowed to register to your service.
Go to Realm Settings (on the left panel) -> Tab User Profile (this tab shows up only if User Profile is enabled in the General tab and you can enable user profile only if you have started Keycloak with -Dkeycloak.profile=preview)
-> JSON Editor.
Now you can edit the file as suggested in the following DIFF snippet. Be mindful that in this example we only allow emails @gmail.com and @hotmail.com to register you want to edit that.
Now our Keycloak server is fully configured we just need to update our Onyxia deployment to let it know about it.
Update the onyxia-values.yaml
file that you created previously, don't forget to replace all the occurence of my-domain.net by your actual domain.
Don't forget as well to remplace the terms of services of the sspcloud by your own terms of services. CORS should be enabled on those .md
links (Access-Control-Allow-Origin: *
).
Now that you have updated onyxia-values.yaml
restart onyxia-web with the new configuration.
Now your users should be able to create account, log-in, and start services on their own Kubernetes namespace.
Onyxia-web use AWS Security Token Service API to get token and empowered user with storage features. We support any S3 storage compatible with this API. In this context, we are using MinIO, which is compatible with the Amazon S3 storage service and we demonstrate how to integrate it with Keycloak.
Before configuring MinIO, let's create a new client for Keycloak (from the previous existing "datalab" realm).
Create a client called "minio".
Client ID: minio
Client Protocol: openid-connect
Root URL: https://minio.lab.my-domain.net/
Complete the content of client "minio" with the following values.
Access Type: confidential
Valid Redirect URIs (two values are required): https://minio.lab.my-domain.net/* and https://minio-console.lab.my-domain.net/*
Web origins: *
Save the content, a new tab called Credentials must be appear. Navigate to Credentials tab and copy the secret value for the next section.
Navigate to Mappers tab and create a protocol Mapper.
Name: policy
Mapper Type: Hardcoded claim
Complete the content of Mapper "policy" with the following values.
Token Claim Name: policy
Claim value: stsonly
Add to ID token: on
Add to access token: on
Add to userinfo: on
We recommand you to follow MinIO documentation for this installation and you must activate OIDC authentification. We will use the official Helm in this tutorial. All Helm configuration values can be found within this link.
Replace
COPY_SECRET_FROM_KEYCLOAK_MINIO_CLIENT
by the secret value defined into the "minio" Keycloak client (see previous section).
MinIO is now deployed and is accessible on the console url.
By default, there are 16 MinIO containers running. If this number is too large for your Kubernetes cluster, you can limit it by configuring the 'replicas' key.
Before configuring the onyxia region to create tokens we should go back to Keycloak and create a new client to enable onyxia-web to request token for MinIO. This client is a little bit more complexe than other if you want to manage durations (here 7 days) and this client should have a claim name policy and with a value of stsonly according to our last deployment of MinIO.
From "datalab" realm, create a client called "onyxia-minio"
Client ID: onyxia-minio
Client Protocol: openid-connect
Root URL: https://onyxia.my-domain.net/
Complete the content of client "onyxia-minio" with the following values.
Access Type: public
Valid Redirect URIs: https://onyxia.my-domain.net/*
Web origins: *
Advanced Settings 1. Access Token Lifespan : 7 days 2. Client Session Idle : 7 days 3. Client Session Max: 7 days
Save the content and navigate to Mappers tab and create two protocol Mappers.
Create the first Mapper called "policy".
Token Name: policy
Mapper Type: Hardcoded claim
Token Claim Name: policy
Claim value: stsonly
Add to ID token: on
Add to access token: on
Add to userinfo: on
Create the second Mapper called "audience-minio".
Token Name: audience-minio
Mapper Type: Audience
_Included Custom Audience _: minio
Add to ID token: on
Add to access token: on
S3 storage is configured inside a region in Onyxia api. You have some options to configure this storage and let inform Onyxia web all needed informations how to generate those tokens : keycloak parameters to access storage API, duration of STS tokens, bucket name with a standard prefix and a claim in the user JWT token to generate a unique identifiant for this bucket name, whether Onyxia-web should try to to create this bucket silently or not. There is also options for projects. You should look all options for the version of your need on github
Onyxia-web use vault as a storage for two kinds of secrets : 1. secrets or information generate by Onyxia to store differents values (ui preferences for example) 2. user secrets Vault must be configured with JWT or OIDC authentification methods.
As vault need to be initialized with a master key, It can't be directly configured with all parameters such as oidc or access policies and roles. So first step we create a vault with dev mode (do not use this in production and do your initialization with any of the recommanded configuration : shamir, gcp, another vault)
Create a client called "vault"
Root URL: https://vault.lab.my-domain.net/
Valid redirect URIs: https://vault.lab.my-domain.net/*
Web origins: *
Every Onyxia instance may or may not have it's own catalog. There is three default catalogs :
This collection of charts help users to launch many IDE with various binary stacks (python , R) with or without GPU support. Docker images are built here and help us to give a homogeneous stack.
This collection of charts help users to launch many databases system. Most of them are based on bitnami/charts.
This collection of charts help users to start automation tools for their datascience activity.
You can always find the source of the catalog by clicking on the "contribute to the... " link.
If you take this other instance, it has only one catalog, helm-charts-sill.
The available catalogs in a given Onyxia instance are configured at install time, example with datalab.sspcloud.fr:
In order to contribute you have to be familiar with Helm and to be familiar with Helm you need to be familiar with Kubernetes objects.
In Onyxia we use the values.schema.json
file to know what options should be displayed to the user at the service configuration step and what default value Onyxia should inject.
Let's consider a sample of the values.schema.json
of the InseeFrLab/helm-charts-datascience's Jupyter chart:
And it translates into this:
Note the "git.name"
, "git.email"
and "git.token"
, this enables onyxia-web to pre fill the fields.
If the user took the time to fill its profile information, onyxia-web know what is the Git username, email and personal access token of the user.
Here is defined the structure of the context that you can use in the overwriteDefaultWith
field:
You can also concatenate string values using mustache syntax.
You probably want to be able to define a limit to the amount of resources a user can request when launching a service.
It's possible to do it at the catalog level but it's best to enable the person who is deploying Onyxia to define boundaries for his deployment regions.
This is the purpose of the x-onyxia
param useRegionSliderConfig
You now have all the relevent information to submit PR on the existing catalogs or even to create your own.
Remember that a helm chart repository is nothing more than a GitHub repo with a special github Action setup to publish the charts on GitHub Pages.
If you are looking for a repo to start from have a look at this one, it has a directory where you can put the icons of your services.
Using Onyxia (as a data scientist)
See also
It's the Onyxia user guide dedicated to our staff.
There are 3 main components accessible on the onyxia web interface :
catalogs and services launched by the users (Kubernetes access)
a file browser (S3 access)
secret browser (Vault access)
Following is a documentation Onyxia when configured with the default service catalogs :
This collection of charts help users to start automation tools for their datascience activity.
The Onyxia user experience may be very different from one catalog of service to another.
The catalog defines what options are available though Onyxia.
Users can edit various parameters. Onyxia do some assertion based on the charts values schema and the configuration on the instance. For example some identity token can be injected by default (because Onyxia connect users to many APIs).
After launching a service, notes are shown to the user. He can retrieve those notes on the README button. Charts administrator should explain how to connect to the services (url , account) and what happens on deletion.
Users can manage their files on S3. There is no support for rename in S3 so don't be surprise. Onyxia is educational. Any action on the S3 browser in the UI is written in a console with a cli.
User can do the following S3 actions :
download files
upload files
delete files
Of course, in our default catalags there are all the necessary tools to connect to S3.
Our advice is to never download file to your container but directly ingest in memory the data.
Users can mange their secrets on Vault. There is also a cli console.
Onyxia use only a key value v2 secret engine in Vault. Users can store some secrets there and inject them in their services if configured by the helm chart.
Of course, in our default catalags there are all the necessary tools to connect to Vault.
The TypeScript App that runs in the browser.
This is the documentation for .
The primary breaking change in this release pertains to Keycloak configuration. With this update, you're no longer limited to using Keycloak; any OIDC-compliant identity provider is now supported. To accommodate this new feature, you'll need to make some adjustments to the configuration of your Onyxia instance.
You don't need to specify the issuerURI
in multiple locations as we have done here.
If you're using just one identity server (You have only one Keycloak server for example), you can set the issuerURI
solely in api->env->oidc.issuer-uri
.
Technologies at play in Onyxia-web
To find your way in Onyxia, the best approach is to start by getting a surface-level understanding of the libraries that are leveraged in the project.
Modules marked by 🐔 are our own.
We are fully committed on keeping everything type safe. If you are a seasoned developer but not fully comfortable with TypeScript yet a good way to get you quickly up to speed is to go through the of the official website.
You can skip anything related to class
we don't do OOP in the project.
We try, whenever we see an opportunity for it, to publish as standalone NPM module chunks of the code we write for Onyxia-web. It help keep the complexity in check. We use TS-CI as a starter for everything we publish on NPM.
If you want to test some changes made to onyxia-ui in onyxia-web before releasing a new version of onyxia-ui to NPM you can link locally onyxia-ui in onyxia-web.
Now you can make changes in ~/github/onyxia/ui/
and see the live updates.
If you want to install/update some dependencies, you must remove the node_modules, do you updates, then link again.
The library we use for styling.
Rules of thumbs when it comes to styling:
Onyxia is mostly used on desktop computer screens. It's not worth the effort to create a fully flege responsive design for the UI. screen-scaler enables us to design for a sigle canonical screen size. The library take charge of scaling/shrinking the image. depending on the real size of the screen. It also asks to rotate the screen when the app is rendered in protrait mode.
To launch Storybook locally run the following command:
We need to be able to do:
Then, somehow, access OIDC_URL
in the code like process.env["OIDC_URL"]
.
It enables to run onyxia-web again a specific infrastructure while keeping the app docker image generic.
It's a collection general purpose react hooks. Let's document the few use cases you absolutely need to understand:
It's a build tool that enables to implement the login and register pages that users see when they are redirected to Keycloak for authentication.
For internalization and translation.
The framework used to implement strict separation of concern betwen the UI and the Core and high modularity of the code.
A lot of the things we do is powered under the hood by EVT. You don't need to know EVT to work on onyxia-web however, in order to demystify the parts of the codes that involve it, here are the key ideas to take away:
This collection of charts help users to launch many IDE with various binary stacks (python , R) with or without GPU support. Docker images are built and help us to give a homogeneous stack.
This collection of charts help users to launch many databases system. Most of them are based on .
We also heavily rely on . It's a collection of utilities that help write cleaner TypeScript code. It is crutial to understand at least , , and to be able to contribute on the codebase.
Anything contained in the directory.
The UI toolkit used in the project, you can find the setup of in onyxia-web here: .
is fully compatible with .
Onyxia-UI offers but you can also use components in the project, their aspect will automatically be adapted to blend in with the theme.
We currently offers builtin support for :
France:
Ultraviolet:
Verdant:
Onyxia (default):
You can also .
The fonts are loaded in the . It's important to keep it that way for Keycloakify.
To release a new version of . You just need to bump the and push. will automate publish .
Every component should acceptprop it should always .
A component should not size or position itself. It should always be the responsibility of the parent component to do it. In other words, you should never have height
, width
, top
, left
, right
, bottom
or margin
in of your components.
You should never have a color or a dimension hardcoded elsewhere than in . Use theme.spacing()
(, , ) and .
It enables us to test the graphical components in isolation. .
In theory it shouldn't be possible, onyxia-web is an SPA, it is just static JS/CSS/HTML. If we want to bundle values in the code, we should have to recompile. But this is where comes into play.
Checkout :
All the accepted environment variables are defined here: . They are all prefixed with REACT_APP_
to be compatible . Default values are defined in this file.
Only in development (yarn start
) is also loaded and have priority over .env
Then, in the code the variable can be accessed .
Please try not to access the environment variable to liberally through out the code. In principle they should only be accessed . We try to keep things as much as possible.
For the sake of performance we enforce that every component be wrapped into . It makes that a component only re-render if one of their prop has changed.
However if you use inline functions or as callbacks props your components will re-render every time anyway:
We always use for callback props. And for callback prop in lists.
It is very handy to be able to get the height and the width of components dynamically. It prevents from having to hardcode dimension when we don’t need to. For that we use ``
If the app is being run on Keycloak the isn't undefined
and it means shat we should render the login/register pages.
If you want to test, uncomment and run yarn start
. You can also test the login pages in a local keycloak container by running yarn keycloak
. All the instructions will be printed on the console.
The keycloak-theme.jar
file is automatically and by the CI.
The library we use for routing. It's like but type safe.
We plane to move to Vite when will support it.
The project is a non-ejected using (you can find the template repo that was used as a base for this project).
We use instead of the default react-scripts
to be able to use custom Webpack plugins without having to eject the App. The custom webpack plugins that we use are defined here . Currently we only one we use is .
Anything contained in the directory.
EVT is an event management library (like is).
If we need to perform particular actions when a value gets changed, we use.
We use Ctx
to detaches event handlers when we no longer need them. (See line 108 on )
In React, we use the hook to work with DOM events.
Onyxia Project Core Team Future Developments Roadmap
Transforming the existing file browser into a comprehensive data explorer is a central aspect of our development roadmap. This enhancement aims to provide data scientists with immediate access to the initial rows of various file formats (including but not limited to parquet, CSV, and JSON) directly via the Onyxia-web interface. This will effectively integrate a basic SQL engine, DuckDB Wasm, for swift data access and manipulation.
Enhancing accessibility is a vital and immediate priority for the Onyxia project. Currently, the platform lacks certain accessibility features which we plan to implement in the immediate future.
As it stands, the Onyxia project does not have the capacity to set quotas for S3 buckets or create custom policies. These responsibilities are currently delegated to other administrators. However, we are in the process of developing an S3 operator that will simplify the process of onboarding onto S3, thereby reducing dependency on external administrators.
src/ui
contains the React application, it's the UI of the app.
src/core
contains the 🧠 of the app.
Nothing in the src/core
directory should relate to React. A concept like react hooks for example is out of scope for the src/core directory.
src/core
should never import anything from src/ui
, even types.
It should be possible for example to port onyxia-web to Vue.js or React Native without changing anything to the src/core
directory.
The goal of src/core
is to expose an API that serves the UI.
The API exposed should be reactive. We should not expose to the UI functions that returns promises, instead, the functions we expose should update states and the UI should react to these states updates.
Whenever we need to interact with the infrastructure we define a port in src/core/port
. A port is only a type definition. In our case the infrastructure is: the Keycloak server, the Vault server, the Minio server and a Kubernetes API (Onyxia-API).
In src/core/adapters
are the implementations of the ports. For each port we should have at least two implementations, a dummy and a real one. It enabled the app to still run, be it in degraded mode, if one piece of the infrastructure is missing. Say we don’t have a Vault server we should still be able to launch containers.
In src/lib/usecases
we expose APIs for the UI to consume.
Let's say we want to create a new page in onyxia-web where users can type in a repo name and get the current number of stars the repo has on GitHub.
UPDATE: This video remain relevant but please not that the clean archi setup have been considerably improved in latest releases. A dedicated repo have been created to explain it in detail.
Main take-way is that app
have been renamed ui
and lib
have been renamed core
.
You might wonder why some values, instead of being redux state, are returned by thunks functions.
For example, it might seem more natural to do:
Instead of what we actually do, which is:
However the rule is to never store as a redux state, values that are not susceptible to change. Redux states are values that we observe, any redux state changes should trigger a re-render of the React components that uses them. Conversely, there is no need to observe a value that will never change. We can get it once and never again, get it in a callback or wherever.
But, you may object, users do login and logout, isUserLoggedIn
is not a constant!
Actually, from the standpoint of the web app, it is. When a user that isn't authenticated click on the login button, it is being redirected away. When he returns to the app everything is reloaded from scratch.
Now let's say we want the search to be restricted to a given GitHub organization. (Example: InseeFrLab.) The GitHub organization should be specified as an environment variable by the person in charge of deploying Onyxia. e.g.:
If no ORG_NAME
is provided by the administrator, the app should always show 999 stars for any repo name queried.
Currently users can save their GitHub Personal access token in their Onyxia account but not yet their GitLab token. Let's see how we would implement that.
The easy action to take when the user selects another project is to simply reload the page (windows.location.reload()
). We want to avoid doing this to enable what we call "hot projet swiping":
To implement this behavior you have to leverage the evtAction middleware from clean-redux. It enabled to register functions to be run when certain actions are dispatched.
Unlike the other video, the following one is voiced. Find the relevant code here.
The backend REST API in Java
This is the documentation for InseeFrLab/onyxia -> api/.
It's the part of the App that runs in the clusters. It handles the things that can't be done directly from the frontend.
Previously, the Helm chart of Onyxia was hosted on the inseefrlab/helm-charts repo and has now been moved to inseefrlab/onyxia. As a result you would now install Onyxia like this:
In the following we assume the current version of Onyxia is 4.1.4 but you are encorging to use the latest version instead. .
If you use ArgoCD for deploying onyxia:
You no longer need to manually manage the version of and , now, if you want to update Onyxia, you just update the chart version number.
For the Keycloak theme, the version is now synchronized with the Onyxia version.
Also note that, the theme will now appear as "onyxia" in the dropdown. Previously it was "onyxia-web"