Projects:Kubernetes

Uit Hackerspace Nijmegen Wiki
Naar navigatie springen Naar zoeken springen

According to Wikipedia, Kubernetes (commonly stylized as k8s) is an open-source container orchestration system for automating application deployment, scaling, and management. I've heard a lot about it, and it seems to solve some problems I'm encountering sometimes, so I'd like to get to know it better. While doing so, I wanted to make a write-up of what I found out.

This page is not intended as a tutorial, but it will link to various tutorials I found useful. In this way, I hope that it can be used as a reference for someone setting up Kubernetes to be able to learn about it more quickly.

Disclaimer: I'm a beginner in the Kubernetes area, so this tutorial will contain errors. Please correct errors where you find them, and if possible, write some explanation! The "I" in this article is Sjors, but parts are possibly written by other people.

Problem

We start with a Linux desktop, running some version of Arch Linux. Someone wants to run an application on it for which only binaries are available, but those binaries were compiled on Ubuntu. Now, system libraries are different between Arch Linux and Ubuntu, and while it's possible to create binaries that run independent of system libraries (called "static binaries"), in this particular situation let's assume the binaries aren't static, but you still want to run them on your Arch installation.

You can get a second machine and run Ubuntu on it, or similarly you could use a Virtual Machine. But, there's a simpler and more efficient solution: Docker allows you to install, within your Linux distribution (the "host"), another Linux distribution (let's call it "guest" for now). The host and the guest may be completely different – technically, only the kernel of the host is used, and of course the host and guest must have compatible processor architectures. You run the guest environment within Docker (a "Docker Container") and run that binary in it.

This is how easy that is:

sjors@somebox:~$ lsb_release -a
LSB Version: 1.4
Distributor ID: Arch
Description: Arch Linux
Release: rolling
Codename: n/a
sjors@somebox:~$ docker run -ti ubuntu:bionic bash
root@a5b210d251c2:/# apt-get update && apt-get -y install lsb-release
[....]
root@a5b210d251c2:/# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic

The -it flag makes this an interactive container that can run bash; just as easily, you can run detached containers running webservers, and use the integrated port-forwarding features to make them accessible to the outside.

Now, Docker has some problems of its own:

  • You start Docker containers by accessing the Docker daemon; the daemon runs containers as root and allows you to start a container with a bind-mount of "/". Basically, having access to the Docker daemon means you have root on the system, but you need access to do anything. It's all-or-nothing.
  • When your Docker machine goes down, all containers are gone. You'll have to either restart all containers manually, or have boot-scripts that set them up, but there's no automatic restart mechanism.
  • When you want to run more Docker containers than fit on one machine, there's no horizontal scaling mechanism built-in.
  • When you want to run a service multiple times, e.g. for redundancy, you need to schedule them manually multiple times and roll your own method of load-balancing them.

Kubernetes provides a solution for these problems, while otherwise looking very much like Docker. In fact, when you're familiar with Docker some of the commands below will also be very familiar to you.

Concepts

In this section, I'll explain some of Kubernetes' concepts quickly, and add links if you want to know more.

  • Container: Like with Docker, this is one 'guest environment' in which you can run anything. Usually, Kubernetes containers are, in fact, Docker containers.
  • Pod: The basic unit you actually schedule in Kubernetes. Usually, a Pod contains one Container, but a Pod can consist of multiple Containers which can be a very useful feature. More on that later.
  • Volume: As in Docker, changes to containers are temporary and will be gone when the container stops. If you want to keep those changes after a restart, like in Docker, you make a Volume. They are also useful to share data between containers. In Kubernetes, Volumes are kept over restarts of Containers, but not over restarts of Pods, unless they are Persistent. More on that later.
  • Service: When your Pod contains some application, such as a webserver, you can make its TCP port available as a Service so that people (inside or outside the cluster) can connect to it. For an application you want to run redundantly, multiple Pods can be started; you'll configure them to share the same Service. This way, when you connect to the Service, you'll get one of the running Pods behind it. Instant redundancy!
  • Namespace: Kubernetes supports running multiple "virtual clusters" on the infrastructure of one "physical cluster". Those virtual clusters are called "namespaces", and you can restrict access to certain namespaces. Normally, you're only working with the "default" namespace.
  • Deployment: To do.

Those are some concepts that allow you to use a Kubernetes cluster. In this guide, we'll also be setting up the infrastructure behind that:

  • Node: A machine that actually runs the Containers. Can be bare-metal or a virtual machine, or even an embedded IoT device. A Node runs a process called "Kubelet" which interacts with the Docker daemon (usually) to set everything up, but normally you never communicate directly with it.
  • Control Plane: A set of some applications (API server, etcd, controller manager, scheduler...) that make sure the cluster is "healthy". For example, it starts Pods when you request it to, but also when a Node goes down that was running Pods, it restarts those Pods elsewhere. These applications all run in Pods, in a separate namespace called "kube-system".
  • Master node: Otherwise a normal Node, but it runs the Control Plane applications. By default, a Master node will only run Pods for these applications, but you can configure it to allow normal Pods too. There can be multiple Master nodes, for redundancy of the cluster.

Understanding networking

Networking within a Kubernetes cluster isn't difficult, but requires some specific explanation. Applications within Kubernetes containers need to be able to access each other, regardless of whether they run on the same Node or not. To make it worse, Nodes may be running behind firewalls or may be running in different subnets. Luckily, Kubernetes has some mechanisms that make it very robust against this.

When setting up a Kubernetes cluster, there are two important internal IP ranges throughout the cluster:

  • The Pod network range. This internal range is automatically split over Nodes, and Pods get individual addresses from it.
* For example, you can set this to 10.123.0.0/16; the master node will likely get 10.123.0.0/24 and the second Node you add after that gets 10.123.1.0/24 and so on. A Pod running on this second node may have 10.123.1.55 as an IP address. (If the Pod has multiple containers, all of them will have the same IP address.)
  • The service network range. When you register a Service, such as "my-fun-webserver", it automatically gets an IP address within this range. An application called the 'kube-proxy', running automatically on every Node, then takes care that any communication with this IP address is forwarded to one of the actual Pods providing that service (by configuring iptables). Fun fact: the Kubernetes API server registers itself as a service and is always available at the first host address of the range you chose.
* For example, your service network range may be 10.96.0.0/16; the Kubernetes API service makes itself available at 10.96.0.1. When you communicate with this IP address, the communication is automatically translated (by iptables) to be sent to the Pod IP address of the Kubernetes API service, e.g. 10.123.1.55.

It's important that these ranges don't overlap, and they also both shouldn't overlap with any relevant IP ranges within your existing network! The Kubernetes folks suggest you use something within 10.0.0.0/8 if your local network range is within 192.168.0.0/16 and vice-versa.

To do: explain CNI

Setting it all up

With that all behind us, let's start setting up our first cluster!