The choice of Docker and Kubernetes did turn out well thought out, both Docker and Kubernetes are more active and supported than when we started, and Docker, that initially was pushing very high much its own orchestration tools, now officially endorses Kubernetes. Google, Amazon, Azure, Open Shift all provide more or less turnkey solutions for it, and kubernetes itself is (I think) the best solution to be able to support cloud solutions without being hostage of a single provider (i.e. avoid vendor lock-in).
This choice has not been without issues: Installation and support in a supercomputing center can be more challenging. There was little experience with such solutions, and technology is changing quickly. As a consequence, we had to do things ourselves, a very time consuming thing. Indeed, every time we had to install kubernetes, the installation was quite different (I also installed on SUSE, Ubuntu and CentOS).
Normally one has to check a bit the documentation (that luckily is normally quite clear, but almost every time there was at least an issue). Still part of this is also a consequence of being an early adopter, installation has been getting simpler with the passing time. Also the move to CentOS as OS of choice did simplify the procedure. Here is the last installation of the production cluster, development cluster, and visualization cluster. Before that Harsha had used Ansible scripts on the clean machines, and before that we had our patched installation scripts for SUSE Linux.
The advantage of cabinets is that it makes discovery, installation and scaling of the various services uniform.
Every service has a service yaml file, that creates a service object with fixed names. Inside the namespace the service can be reached simply through its name at its port. Toward the exterior if one uses a node port, with it all nodes of the cluster will make the service available under the same port. This avoids any bottleneck of having a single entry point in the service, as it scales with the cluster. Then normally a deployment creates pods that actually serve the service (endpoints in kubernetes speak). The deployment can be scaled if needed.
The services need to be delivered to the user. To do it we use an Nginx reverse proxy whose config is generated from handlebars templates that use information extracted automatically from kubernetes as explained here.
The actual deployment of the NOMAD Archive and Analytics is described here.