Container Manager

About

The Container Manager provides a flexible platform to start arbitrary containers for users, it also removes the complexity of handling multiple users from the things the container has to handle.

In NOMAD we use it in the analytics and remote visualization parts, for beaker, jupyter, creedo and noVNC (we shortly tested also other options like Zeppelin). There are in the code settings for these use cases, but other image types can be easily added (see the configuration section).

The main thing that might need some tweaking to be used in another place is the authentication, currently it will try to use the nomad IDP (identity provider). Most likely you aren’t registered as a trusted service provider. Still the internal back-end that we use is passport, and it probably supports your preferred authentication method. Already now with NODE_ENV=localSetup for example a local list of user/passwords are used as an authentication method.

Thus, you are welcome to use it also in your project if it makes sense.

The code is reasonably clean, but everything is a bit too monolithic right now.

Context

Big Data is a very trendy field, which is nice and exciting: there are lots of development and marketing, but as a consequence, it isn’t always easy to understand what will actually survive, and what will be abandoned. If you do a real project you have to make some choices. Some turned out to be right, other a bit less so.

In NOMAD we wanted to give to any user the chance to do real analytics. This meant giving a user a real programming language, and thus the possibility to do real damage. Here we decided to use a separate Docker container managed by Kubernetes for every user. The choice of Docker and Kubernetes did turn out well thought out, both Docker and Kubernetes are more active and supported than when we started, and Docker, that initially was pushing very much its own orchestration tools, now officially endorses Kubernetes. This approach also gives a lot of flexibility about what to run in the container.

In NOMAD we run beaker, jupyter, [creedo]( and noVNC. With beaker we did choose a bit less well, as beaker was replaced by beakerx a jupyter extension.

A central piece in this is something that starts containers on behalf of the user and then connects each user with its container. We call this piece of software Container Manager(git).

The Container Manager was developed by Ankit Kariryaa and me started out quite heavy, using replica controllers and services defined in the code before becoming the current lightweight pod based controller using Handlebars templates.

The following schema shows the basic idea and workflow behind the Container Manager

Container Manager Overview

The Container Manager makes sure that a user is logged in, and then if needed starts a container for him, and then redirects (basically) all further requests to this container.

The replacements define the pod that should be started can be overridden by the user (if k8component.entryPoint.replacementsFromQueryParameters is true). This is an important feature to allow users to test new containers or have frozen versions reproducing published results.

Configuration

We did try to keep the Container Manager as simple as possible, but still allow one to easily customize it to his needs.

Adapting it to most use cases means to override or extend the default values (config/default.hjson). The override can have two parts, an image specific override controlled with the NODE_APP_INSTANCE environment variable. For example, for jupyter NODE_APP_INSTANCE=jupyter, and the overrides are in (config/default-jupyter.hjson). There is also a machine specific override that is controlled by the NODE_ENV environment variable. For example a development setup can define NODE_ENV=localSetup and use the overrides in config/localSetup.hjson. The details of the overrides are discussed in the config npm package.

The values in k8component and k8component.image are used, together with the values passed in the <prefix>/cM/start/ start url to create a set of replacements. These replacements are then used to instantiate the handlebars template given by the configuration value k8component.templatePath (kube/defaultTemplate.yaml by default).

A custom command using parameters passed in the start url (like a path) can be run in the container, and the command can be redirected to different places depending on what the command returns.

Thus, basically without any programming, one can configure the Container Manager to start his containers.

Security

If you allow users to use a service on the internet then you have to worry about security. If we let them execute arbitrary programs the doubly so. Absolute security, especially when allowing the user so much freedom is not possible, and some effort should also be made to avoid being a too attractive target.

The two of the main attack vectors (beside low level OS network code) are directly the Container Manager or the container that is run on behalf of the user.

Container Manager attacks

The Container Manager can be attacked using a bug in the software stack we use, mainly node, express, node-http-proxy. I won’t discuss these here, all these projects have a large community, and the issues will be fixed by them, the only thing is that one might have to update or do what the community suggests (just like the OS issues).

Aside from unexpected bugs the main attack vector is malicious user input in the start URI that defines the replacements. As said this is a nice feature, but if one is not careful it can be misused. For this reason (as done in the default config) it is possible to define keysToProtect i.e. keys that cannot be overridden, and some keys (like podName and user) always take the values given by the Container Manager (i.e. cannot be overridden even in the config).

For the image to use overriding is a nice feature, it allows testing of new images and frozen images, but at the same time one would not like to allow arbitrary images (a bitcoin miner for example). Our solution is to validate the source of the image by checking it against the (non overridable) regular expression given in imageReStr. This guarantees that only images created by trusted users can be run.

Another issue is a code injection attack, or rather a yaml injection attack: the replacements are used to create kubernetes yaml file that then instantiates pods. A malicious value could modify the yaml in unexpected ways: for example

 a: "{{x}}"

with x="\"\n b: \"y" expands to

 a: ""
 b: "y"

To avoid this kind of attacks there are some helpers, notably e that escapes things to ensure that they are in a double quoted string, and n that ensures that the result is an integer. Thus

 a: "{{e x}}"

Expands to

 a: "\"\n.    b: \"y"

Which avoids injection problems. Inserting the escape helpers at the right place (most likely all variables that can be overwritten need a helper) is the template writer’s burden, if the user overrides are allowed (k8component.entryPoint.replacementsFromQueryParameters is true).

Recently kubernetes introduced Role Based Access Control (RBAC), using it a good mitigation strategy is to restrict the role used by the Container Manager for pod creation in the analytics namespace only.

Attack in the target container

The security in the container run for the user depends very much on the container that is run, and what it allows the user to do, not on the Container Manager.

A big part of the security comes from relying on the Linux and docker sandboxing capabilities.

Normally one should start as an unprivileged user, so that a first hurdle to overcome is to overcome the OS protections and become root in the container.

Then the container should mount only the data that the user can see, and should mount it as a read-only filesystem unless the user can edit them. Thus, even if the user (in the worst case) becomes root in the container, he cannot see edit or data he shouldn’t.

Obviously, if the user manages to escape the sandbox and become root on the node he can still do real damage, and monitoring the kubernetes cluster is important.

A more subtle attack in the target container does not try to subvert very much the container, but uses it as way to access services that should not be accessed, taking advantage of the fact that the container is in an internal network. For example, if the container accesses directly a database, or elastic search index, then it is probably not too difficult to access them directly with the same credentials, and issue different commands. Here kubernetes network policies, creating services that authentify or do not expose dangerous commands also internally is the way to go, unfortunately the correct solution depends on the services and environment, and thus requires some thinking.

Update: Recently (4.11.2018) a Kubernetes bug exposed exactly this kind of issue: users in a pod could talk to the kubernetes service (that should allow only encrypted communication for which you need a valid certificate) allowed unauthenticated commands to the storage server potentially taking over the whole cluster.

Scaling

All communication for the pods goes through the Container Manager, which is thus a bottleneck. Yes, normally the tasks driven by the Container Manager are relatively heavy, as they are using a container per user, and one does not expect to have a large amount of concurrent users, still every user might create quite a load, so one might run into problems. Luckily the Container Manager can be easily scaled: it does only a local caching to avoid overloading kubernetes, but it has no shared state besides what resides in kubernetes, so one can have multiple instances working in parallel, and scale creating multiple Container Managers and load balancing between them.

Container cleanup

Something that I did not discuss until now is how to stop a running container. Obviously the user can stop it from the <prefix>/cM/view-containers endpoint or via the delete API call to <prefix>/cM/container/<name>. Still currently, even if a user does not connect for a while the Container Manager never not stop his containers. If the pod has to be stopped the container itself (or a companion container) has to decide when to stop, and end itself (monitoring his accesses). The Container Manager could track access and finish the pod after some idle threshold, but that needs extra status, and is currently not done.

Alternatives

When we started NOMAD there weren’t really alternatives to the Container Manager, the option was either do a custom proxy, or make the container run multiuser, and thus able to serve different users differently. But as said this field is evolving quickly, and the Container Manager does fill a real need. So in the meantime, some things that can partially cover what the Container Manager does have arisen.

traefik, for example, is a a dynamic proxy that can forward things in a flexible way, but container creation is not part of it. Kubernetes ingress can also cover some of the proxy part but it cannot support multiple users.

If one is interested just in Jupyter notebooks jupyterhub does what we do, but just targeted at Jupiter notebooks. It is interesting that they choose the same core technology (node-http-proxy) as we did. For Jupiter only needs this might be a good alternative, and might have some Jupyter specific advantages. Indeed, Adam Fekete is looking into this.

Conclusions

The Container Manager is open source and has an open git repository on gitlab, and the original one hosted at the MPCDF. You can use it for your own stuff, or by using the analytics and remote visualization of NOMAD.

Go and use it!

Updated:

Leave a comment

Comments are moderated. Your email address is neither published nor stored, only an md5 hash of it. Required fields are marked with *

Loading...