Some amount of confusion exists around how we currently, and in future want to ensure resilience of the Kubernetes (and by implication Kubernetes Cluster Federation) control plane. This document is an attempt to capture that definitively. It covers areas including self-healing, high availability, bootstrapping and recovery. Most of the information in this document already exists in the form of github comments, PR's/proposals, scattered documents, and corridor conversations, so document is primarily a consolidation and clarification of existing ideas.
Control Plane Component | Resilience Plan | Current Status |
API Server | Multiple stateless, self-hosted, self-healing API servers behind a HA load balancer, built out by the default "kube-up" automation on GCE, AWS and basic bare metal (BBM). Note that the single-host approach of having etcd listen only on localhost to ensure that only API server can connect to it will no longer work, so alternative security will be needed in the regard (either using firewall rules, SSL certs, or something else). All necessary flags are currently supported to enable SSL between API server and etcd (OpenShift runs like this out of the box), but this needs to be woven into the "kube-up" and related scripts. Detailed design of self-hosting and related bootstrapping and catastrophic failure recovery will be detailed in a separate design doc. | No scripted self-healing or HA on GCE, AWS or basic bare metal currently exists in the OSS distro. To be clear, "no self healing" means that even if multiple e.g. API servers are provisioned for HA purposes, if they fail, nothing replaces them, so eventually the system will fail. Self-healing and HA can be set up manually by following documented instructions, but this is not currently an automated process, and it is not tested as part of continuous integration. So it's probably safest to assume that it doesn't actually work in practise. |
Controller manager and scheduler | Multiple self-hosted, self healing warm standby stateless controller managers and schedulers with leader election and automatic failover of API server clients, automatically installed by default "kube-up" automation. | As above. |
etcd |
Multiple (3-5) etcd quorum members behind a load balancer with session
affinity (to prevent clients from being bounced from one to another).
Regarding self-healing, if a node running etcd goes down, it is always necessary
to do three things:
Somewhat vague instructions exist on how to set some of this up manually in
a self-hosted configuration. But automatic bootstrapping and self-healing is not
described (and is not implemented for the non-PD cases). This all still needs to
be automated and continuously tested.
|
|