A Kubernetes application configuration (e.g. for a Pod, Replication Controller, Service etc) should be able to be successfully deployed into any Kubernetes Cluster or Federation of Clusters, without modification. More specifically, a typical configuration should work correctly (although possibly not optimally) across any of the following environments:
It should be possible to explicitly opt out of portability across some subset of the above environments in order to take advantage of non-portable load balancing and DNS features of one or more environments. More specifically, for example:
Cross-cluster Federated load balancing is built on top of the following:
The Cluster Federation API for load balancing should be compatible with the equivalent Kubernetes API, to ease porting of clients between Kubernetes and federations of Kubernetes clusters. Further details below.
To be useful, our load balancing solution needs to work properly with real client applications. There are a few different classes of those...
These are the most common external clients. These are all well-written. See below.
Examples:
Examples:
Examples:
Each cluster hosts one or more Cluster Federation master components (Federation API servers, controller managers with leader election, and etcd quorum members. This is documented in more detail in a separate design doc: Kubernetes and Cluster Federation Control Plane Resilience.
In the description below, assume that 'n' clusters, named 'cluster-1'... 'cluster-n' have been registered against a Cluster Federation "federation-1", each with their own set of Kubernetes API endpoints,so, "http://endpoint-1.cluster-1, http://endpoint-2.cluster-1 ... http://endpoint-m.cluster-n .
Federated Services are pretty straight-forward. They're comprised of multiple equivalent underlying Kubernetes Services, each with their own external endpoint, and a load balancing mechanism across them. Let's work through how exactly that works in practice.
Our user creates the following Federated Service (against a Federation API endpoint):
$ kubectl create -f my-service.yaml --context="federation-1"
where service.yaml contains the following:
kind: Service
metadata:
labels:
run: my-service
name: my-service
namespace: my-namespace
spec:
ports:
- port: 2379
protocol: TCP
targetPort: 2379
name: client
- port: 2380
protocol: TCP
targetPort: 2380
name: peer
selector:
run: my-service
type: LoadBalancer
The Cluster Federation control system in turn creates one equivalent service (identical config to the above) in each of the underlying Kubernetes clusters, each of which results in something like this:
$ kubectl get -o yaml --context="cluster-1" service my-service
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2015-11-25T23:35:25Z
labels:
run: my-service
name: my-service
namespace: my-namespace
resourceVersion: "147365"
selfLink: /api/v1/namespaces/my-namespace/services/my-service
uid: 33bfc927-93cd-11e5-a38c-42010af00002
spec:
clusterIP: 10.0.153.185
ports:
- name: client
nodePort: 31333
port: 2379
protocol: TCP
targetPort: 2379
- name: peer
nodePort: 31086
port: 2380
protocol: TCP
targetPort: 2380
selector:
run: my-service
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 104.197.117.10
Similar services are created in cluster-2
and cluster-3
, each of which are
allocated their own spec.clusterIP
, and status.loadBalancer.ingress.ip
.
In the Cluster Federation federation-1
, the resulting federated service looks as follows:
$ kubectl get -o yaml --context="federation-1" service my-service
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2015-11-25T23:35:23Z
labels:
run: my-service
name: my-service
namespace: my-namespace
resourceVersion: "157333"
selfLink: /api/v1/namespaces/my-namespace/services/my-service
uid: 33bfc927-93cd-11e5-a38c-42010af00007
spec:
clusterIP:
ports:
- name: client
nodePort: 31333
port: 2379
protocol: TCP
targetPort: 2379
- name: peer
nodePort: 31086
port: 2380
protocol: TCP
targetPort: 2380
selector:
run: my-service
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- hostname: my-service.my-namespace.my-federation.my-domain.com
Note that the federated service:
In addition to the set of underlying Kubernetes services (one per cluster) described above, the Cluster Federation control system has also created a DNS name (e.g. on Google Cloud DNS or AWS Route 53, depending on configuration) which provides load balancing across all of those services. For example, in a very basic configuration:
$ dig +noall +answer my-service.my-namespace.my-federation.my-domain.com
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.117.10
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.74.77
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.38.157
Each of the above IP addresses (which are just the external load balancer ingress IP's of each cluster service) is of course load balanced across the pods comprising the service in each cluster.
In a more sophisticated configuration (e.g. on GCE or GKE), the Cluster Federation control system automatically creates a GCE Global L7 Load Balancer which exposes a single, globally load-balanced IP:
$ dig +noall +answer my-service.my-namespace.my-federation.my-domain.com
my-service.my-namespace.my-federation.my-domain.com 180 IN A 107.194.17.44
Optionally, the Cluster Federation control system also configures the local DNS servers (SkyDNS) in each Kubernetes cluster to preferentially return the local clusterIP for the service in that cluster, with other clusters' external service IP's (or a global load-balanced IP) also configured for failover purposes:
$ dig +noall +answer my-service.my-namespace.my-federation.my-domain.com
my-service.my-namespace.my-federation.my-domain.com 180 IN A 10.0.153.185
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.74.77
my-service.my-namespace.my-federation.my-domain.com 180 IN A 104.197.38.157
If Cluster Federation Global Service Health Checking is enabled, multiple service health checkers running across the federated clusters collaborate to monitor the health of the service endpoints, and automatically remove unhealthy endpoints from the DNS record (e.g. a majority quorum is required to vote a service endpoint unhealthy, to avoid false positives due to individual health checker network isolation).
So far we have a federated service defined, with a resolvable load balancer hostname by which clients can reach it, but no pods serving traffic directed there. So now we need a Federated Replication Controller. These are also fairly straight-forward, being comprised of multiple underlying Kubernetes Replication Controllers which do the hard work of keeping the desired number of Pod replicas alive in each Kubernetes cluster.
$ kubectl create -f my-service-rc.yaml --context="federation-1"
where my-service-rc.yaml
contains the following:
kind: ReplicationController
metadata:
labels:
run: my-service
name: my-service
namespace: my-namespace
spec:
replicas: 6
selector:
run: my-service
template:
metadata:
labels:
run: my-service
spec:
containers:
image: gcr.io/google_samples/my-service:v1
name: my-service
ports:
- containerPort: 2379
protocol: TCP
- containerPort: 2380
protocol: TCP
The Cluster Federation control system in turn creates one equivalent replication controller (identical config to the above, except for the replica count) in each of the underlying Kubernetes clusters, each of which results in something like this:
$ ./kubectl get -o yaml rc my-service --context="cluster-1"
kind: ReplicationController
metadata:
creationTimestamp: 2015-12-02T23:00:47Z
labels:
run: my-service
name: my-service
namespace: my-namespace
selfLink: /api/v1/namespaces/my-namespace/replicationcontrollers/my-service
uid: 86542109-9948-11e5-a38c-42010af00002
spec:
replicas: 2
selector:
run: my-service
template:
metadata:
labels:
run: my-service
spec:
containers:
image: gcr.io/google_samples/my-service:v1
name: my-service
ports:
- containerPort: 2379
protocol: TCP
- containerPort: 2380
protocol: TCP
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status:
replicas: 2
The exact number of replicas created in each underlying cluster will of course depend on what scheduling policy is in force. In the above example, the scheduler created an equal number of replicas (2) in each of the three underlying clusters, to make up the total of 6 replicas required. To handle entire cluster failures, various approaches are possible, including:
The implementation approach and architecture is very similar to Kubernetes, so if you're familiar with how Kubernetes works, none of what follows will be surprising. One additional design driver not present in Kubernetes is that the Cluster Federation control system aims to be resilient to individual cluster and availability zone failures. So the control plane spans multiple clusters. More specifically:
Cluster Controllers in the Federation control system watch against the Federation API server/etcd state, and apply changes to the underlying kubernetes clusters accordingly. They also have the anti-entropy mechanism for reconciling Cluster Federation "desired desired" state against kubernetes "actual desired" state.