3 years managing Kubernetes clusters, my 10 lessons.

Herve Khg
4 min readNov 12, 2023

Over the past three years, I’ve navigated the sometimes turbulent waters of managing Kubernetes clusters. This journey, filled with challenges and discoveries, has given me a deep understanding of this cutting-edge technology and its many facets. In this article, I wish to share with you the ten most valuable lessons I’ve learned as a Kubernetes cluster manager.

These lessons span a range of topics, from managing the underlying infrastructure to optimizing deployment processes, and include best practices for ensuring the scalability and security of your clusters. Whether you’re new to the world of Kubernetes or a seasoned expert, these insights will provide you with an enriching perspective on how to effectively manage your Kubernetes clusters.

Let’s dive together into these teachings, the fruits of three years of experiences, successes, and challenges overcome.

Lesson 1: Use Kubernetes in the cloud

Unless there’s an extreme constraint, it’s unnecessary to manage Kubernetes’ underlying infrastructure yourself. You’ll spend your time debugging problems that don’t add value to your business. Being an expert in kube-api, kube-apiserver, kubelet, etcd, kube-proxy, etc., is great, but having to maintain this yourself daily doesn’t add value. You don’t need to be an expert in these concepts to manage a cluster effectively. Delegate this low-level task to cloud service providers (AWS, Azure, GCP, OVH, etc.) who do it better than you. At HK-TECH, we chose AWS and the EKS cluster (ECS is not Kubernetes!).

Lesson 2: Deploy your entire Kubernetes-related infrastructure with code.

Not a single part of your cluster should be manually done on the console, not even a simple tag. Especially avoid the “I fixed it quickly on the console, I’ll update the code later” mindset. Myth: You’ll never do it.

Lesson 3: Avoid overusing helm charts that you don’t fully control.

Yes, they’re great, work fast, and you don’t have to break your head over writing your YAMLs, except on the day an update breaks everything. If you’re really lazy or short on time, at least make an effort to understand every variable in the values.yaml file and avoid default values. At HK-Tech, the rule is no Helm chart; at worst, we retrieve the templates.

Lesson 4: Kubernetes doesn’t like lift and shift.

So, you’ll need to get your hands dirty with your old apps to redesign them to be cloud-compatible. It’s not up to Kube to adapt to your app, but for the app to adapt. If you’re not in a position to recode your apps, maybe stick with your old VMs.

Lesson 5: Mesh or not to mesh?

Don’t install a service mesh if you don’t need it. How to know if you need it? Ask yourself two questions: Do the applications in my cluster communicate with each other? Do exchanges between the applications in my cluster need to be secured? If the answer is yes to both, then installing a service mesh can be useful. I have no specific recommendation; generally, they are all similar.

Lesson 6: Avoid multiplying tools.

Kubernetes offers tons of ancillary tools that promise mountains and miracles for better management of your clusters: argocd, lens, k9s, keda, krew, kubectx, kubens, kail… Avoid stacking them up, good old kubectl meets 90% of the needs. Personally, I limit myself to kubectx, kubens, k9s for a real gain in administering my clusters.

Lesson 7: You must always define resource limits (memory and CPU) allocated to your pods.

It will prevent the risk of a poorly coded or configured application gobbling up all your cluster resources and taking down your applications one after another because some pods are too greedy. It’s also one of the reasons to be wary of helm charts and always check the source code of the manifests behind the pretty packaging.

Lesson 8: Think stateless.

Ideally, it’s better to avoid persisting data in your pods. If for some reason it’s not possible otherwise, then prefer mounts on NAS rather than on disks. Otherwise, you’ll be unpleasantly surprised to find that some pods in your deployment don’t have access to the persisted resources. Yes, a hard drive can only be mounted on one node, so if your pods are distributed across multiple nodes, the pods on the same node will see the same data but not those on other nodes. With NAS-type mounting like EFS, you’ll avoid this problem.

Lesson 9: Configure HPA (Horizontal Pod Autoscaler).

If you want to stop working like in the old world and benefit from Kubernetes’ power to auto-manage resource utilization according to demand, you will need to configure HPA on all your application projects. (Another limit of helm charts, where it’s unfortunately often very absent).

Lesson 10: Don’t be afraid of change.

On average, you should plan for three version upgrades of your cluster per year, about one update every four months. Some updates are transparent, but often there will be changes with impacts. To better prepare for these updates, I recommend reading, re-reading, and revisiting the release notes and the experiences of those who have updated before you. What I recommend and what we’ve implemented at HK-TECH is always to be one release below the latest version (unless there are security changes).

Happy Kubernetes!

Follow me on Linkedin: https://www.linkedin.com/in/herv%C3%A9-ga%C3%ABl-kouamo-157633197/

--

--

Herve Khg

Multi Cloud (Azure/ AWS) Systems/Devops Engineer in France.