Whether to use Kubernetes or not is the question. This takes me back to the old Hadoop argument. People used to ask me to set up Hadoop clusters for them. As soon as I enquired how much data they had, it became immediately apparent that they were looking for a different solution than Hadoop. We can ask the same question with Kubernetes.
What are you trying to achieve in using it? Suppose the answer is a highly scalable infrastructure for my containerised applications and other workflows. It might be the correct answer, but other options here are far less complex. Suppose you are trying to run custom workflows and data processing pipelines leveraging OCI technology. In that case, they may make the support and administration of your workloads far more manageable; this may also be the correct answer; we’ll try and establish this later. If the answer is just to run some containers, then Kubernetes is almost always the wrong answer.
A bit of history
Kubernetes was designed originally by Google, taking some of the ideas from its Borg cluster and making them more usable for general-purpose workloads. The Kubernetes project is now managed by the Cloud Native Computing Foundation, which provides it with an open-source and relatively vendor-agnostic space to reside and mature outside the grips of a single organisation.
What is Kubernetes good at?
So, what is Kubernetes good at? Kubernetes is great for running complex containerised workloads distributed across multiple servers, providing HA, scale, or target-specific hosts for services to run on. It’s good at dealing with failures, restoring services and balancing load for high traffic workloads. As Kubernetes has matured, so have its use cases, which are not just suitable for running websites and service traffic; these days, Kubernetes Jobs and Cronjobs allow for batch data processing, for example. But also, with the greater reliance on streaming systems for data processing, traffic management and more, Kubernetes has become a crucial tool in the armoury of many systems administrators. If multi-cloud, hybrid, or non-cloud hosting is what you’re after, then Kubernetes may also be what you’re looking for. Most cloud vendors will offer up a hosted version of Kubernetes for you. Still, running a different distribution in a self-hosted context may be what you need for various reasons.
What else is out there?
Okay, so you’re looking at deploying some containers. What should you use? This comes down to how complex that deployment is, what the SLA requirements are and what support you have or will need to run the workloads.
For example, you want to run a containerised NodeJS app that has no backend; you want people to be able to hit a webpage and not much more. The first question here is, does it need to be containerised in the first place? Setting that aside, unless it has spotty, high-volume traffic patterns, the most straightforward approach is spinning up a virtual machine and running the Docker CLI commands on the box to install and expose the Docker container. This would give you a machine hosting your container. If the container falls over, you can command it to restart, and so on. But from this, you don’t get scaling, you don’t get any HA and the like.
What about connecting containers? If you wanted to extend this NodeJS app and run a Postgres backend, you could host them both on the same machine; this is perfectly acceptable. You can use Docker’s internal network addressing system to ensure they both talk to one another. You can also mount drives into the containers so that the data is persisted to the virtual machine or a remote drive somewhere. Docker-compose is another docker-centric feature that can simplify the deployment of multi-container instances but without the complexity of Kubernetes. This tooling takes a text file with your container definitions in it. It runs these services in a single command, rather than you scripting the service bootstrap manually.
But what about scale? What about self-healing? What about distributing the load across multiple machines? These are all valid questions. But still, if you’re hosting your services in the cloud, other options reduce the complexity whilst giving you what you’re craving.
Let’s look at Amazon’s Elastic Container Service. This service is designed to run containers, always a good start, but it’s designed to do it without the same level of complexity as Kubernetes. You can have your pods backed by EC2 or Amazon Fargate; you can select the number of “tasks” to be run, mount drives, and all the rest. But you don’t need to provision servers, and you don’t need to ensure they keep running. So, we’re removing some of the manual nature of running containers directly on EC2 by leveraging a wrapper to run the containers and manage them for you on top of EC2. Another bonus about ECS is that you pay for the underlying compute costs, so if you’re looking to run simple containers on EC2, rather than manually cranking EC2 servers with the Docker runtime on them, having ECS do it for you can make that process more resilient with just a little work upfront.
Another excellent example of simple docker deployments in a managed manner is Google’s Cloud Run. Cloud Run is designed for short-running stateless web requests. I will deal with scale automatically for you so that as requests come in, it will scale appropriately. Cloud Run and ECS will add HTTPS endpoints for you via load balancers to secure requests out of the box. Of course, we were talking about a Postgres-backed web service before. One of the significant aspects of deploying into the cloud and not leveraging Kubernetes means you don’t have to use self-hosted databases and other data containers; you can leverage Amazon’s RDS, Google Cloud SQL and others to support your workloads, reducing the scaling and data security aspect of running these databases manually.
How do I choose which to use?
When should you use Kubernetes? I was recently working on a project where portability was an important concept. We had a lot of infrastructure deployed in Kubernetes, but some were Lambda functions. So, we pulled those functions out and turned them into “serverless” functions on our Kubernetes cluster. From there, that allowed us to run the same web service and API backend across multiple cloud vendors and locally on laptops, desktops, etc. This is a perfectly valid use case. We also contemplated docker-compose and similar, but in the end, as we already had workloads in Amazon’s Kubernetes service, we decided that moving it all in made the most sense in this case.
Scale
If you want to have fine-grained control over your scalability, then Kubernetes may also be a suitable candidate; being able to scale to different VMs for different types of workloads, auto-scaling on different metrics and the like, this can be a great way to optimising your compute for specific workloads across a single application.
Health
Kubernetes can also be great for data processing workloads. Its dynamic scaling and self-healing can make running things like Kafka clusters beneficial. Cloud vendors often have their own Kafka-esque implementations. You might find Kubernetes more effective for low latency, highly tunable deployments than other Docker-backed services.
Automation
Of course, every vendor has its own SDK, and you can automate many of it. But not everything is accessible. Running them inside of a Kubernetes cluster makes sense if you want to be able to automate containerised workloads. Not so long ago, we had a project that required the ability to deploy a random number of containers to parallelly process some data when a user clicked a button in the interface. It all had to happen seamlessly under the hood. We found Amazon’s Fargate VMs to be great; they are quick to spin up, paid for by the minute, and suitable for sporadic workloads.
Replication
Replication is a slightly more complex topic; if you run workloads using docker-compose, you can run them the same(almost) on a developer’s laptop as in the cloud. Doing this with Kubernetes is possible, but it takes a lot more setup and, in general, maintenance. So, expect some divergence at this point. However, utilising processes like helm charts can make this slightly more comfortable. However, it still takes work, especially if you’re trying to debug connection or similar issues on a remote setup.
Taking all this into account, if you are in a situation where you’re either lifting and shifting legacy applications to containers or you don’t have a bunch of in-house experience with Kubernetes, networking strategies, deployments, services, load balancing and the like, then finding an alternative solution may be more optimal. Cost is another factor that can decide whether to port to Kubernetes or something else. Kubernetes can be expensive; tuning it up takes time and effort, but additional services must be considered on top of your workload. Backplane services and containers must run on each node to keep the network and similar running; this takes time management and consideration.
Conclusion
Returning to where we started, when Hadoop was all the rage, people often tried to run all their workloads in it. It would work, but it wasn’t worth the effort to get the clusters up and running, ETLs rewritten to Map Reduce and so on in most cases. With Kubernetes, you will often find that this is the same. Sure, you can containerise your workloads. A lot of the time, this is beneficial regardless, as it means you can control the run time environment much more quickly, but does it mean every workload should be destined for Kubernetes? Absolutely not. If your environmental complexity is low, Kubernetes probably doesn’t fit the bill. If you lack in-house resources, Kubernetes probably doesn’t fit the bill. Suppose you know you will migrate most of your services from their incumbents to cloud-specific tooling. In that case, you don’t need Kubernetes running whatever is left.
If you’re looking for an environment that can scale wildly, deal with the self-healing of services, and manage deployment targets and services automatically, potentially in an environment that works across multiple clouds in an almost agnostic way. Kubernetes may be what you’re looking for. The tooling around Kubernetes today is much better than a few years ago. Gitops’ methodologies and similar tools make it far easier for software development shops to turn their pipelines into something designed for Kubernetes. Don’t be frightened by Kubernetes. There are plenty of ways to get started, from Docker’s Kubernetes on the Desktop to K3S, MicroK8S, Minikube and more, and they’re just for running locally. When you reach the cloud, you can run the same or launch AKS, EKS, GKE, and others. The ecosystem is enormous, but remember that Kubernetes has a learning curve; it’s not just a docker runtime and should be treated as such.