Browse Source

Docs: Add kubernetes and troubleshooting info

Tom Denham 7 years ago
parent
commit
cf47f5aa6e
5 changed files with 151 additions and 23 deletions
  1. 36 0
      Documentation/Kubernetes.md
  2. 15 0
      Documentation/building.md
  3. 1 1
      Documentation/running.md
  4. 81 1
      Documentation/troubleshooting.md
  5. 18 21
      README.md

+ 36 - 0
Documentation/Kubernetes.md

@@ -0,0 +1,36 @@
+# kubeadm
+
+For information on deploying flannel manually, using the Kubernetes installer toolkit kubeadm, see [Installing Kubernetes on Linux with kubeadm][kubeadm].
+
+NOTE: If `kubeadm` is used, then pass `--pod-network-cidr=10.244.0.0/16` to `kubeadm init` to ensure that the `podCIDR` is set.
+
+kubeadm has RBAC enabled by default so you must apply the `kube-flannel-rbac.yml` manifest as well as the `kube-flannel.yml` manifest.
+
+* `kubectl apply -f kube-flannel-rbac.yml -f kube-flannel.yml`
+
+If you didn't apply the `kube-flannel-rbac.yml` manifest, you'll see errors in your flanneld logs about failing to connect. 
+* `Failed to create SubnetManager: error retrieving pod spec...`
+
+If you forgot to apply the `kube-flannel-rbac.yml` manifest and notice that flannel fails to start, then it is safe to just apply the `kube-flannel-rbac.yml` manifest without running `kubectl delete -f kube-flannel.yaml` first.
+* `kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml`
+
+# kube-flannel.yaml
+
+The `flannel` manifest defines three things:
+1. A service account for `flannel` to use.
+2. A ConfigMap containing both a CNI configuration and a `flannel` configuration. The `network` in the `flannel` configuration should match the pod network CIDR. The choice of `backend` is also made here and defaults to VXLAN.
+3. A DaemonSet to deploy the `flannel` pod on each Node. The pod has two containers 1) the `flannel` daemon itself, and 2) a container for deploying the CNI configuration to a location that the `kubelet` can read.
+
+When you run pods, they will be allocated IP addresses from the pod network CIDR. No matter which node those pods end up on, they will be able to communicate with each other.
+
+## The flannel CNI plugin
+
+The flannel CNI plugin can be found in the CNI plugins [reposistory](https://github.com/containernetworking/plugins). For additional details, see the [README](https://github.com/containernetworking/plugins/tree/master/plugins/meta/flannel)
+
+Kubernetes 1.6 requires CNI plugin version 0.5.1 or later.
+
+# Troubleshooting
+
+See [troubleshooting](troubleshooting.md)
+
+[kubeadm]: https://kubernetes.io/docs/getting-started-guides/kubeadm/

+ 15 - 0
Documentation/building.md

@@ -22,3 +22,18 @@ You will now have a `flanneld-amd64` binary in the `dist` directory.
     * `make release`
 3. Attach all the files in `dist` to the GitHub release.
 4. Run `make docker-push-all` to push all the images to a registry.
+
+# Obtaining master builds
+
+A new build of flannel is created for every commit to master. They can be obtained from [https://quay.io/repository/coreos/flannel-git](https://quay.io/repository/coreos/flannel-git?tab=tags )
+
+* `latest` is always the current HEAD of master. Use with caution
+* The image tags have a number of components e.g. `v0.7.0-109-gb366263c-amd64`
+  * The last release was `v0.7.0`
+  * This version is 109 commits newer
+  * The commit hash is `gb366263c`
+  * The platform is `amd64`
+
+These builds can be useful when a particular commit is needed for a specific feature or bugfix.
+
+NOTE: the image name is `quay.io/coreos/flannel-git` for master builds. *Releases* are named `quay.io/coreos/flannel` (there is no `-git` suffix).

+ 1 - 1
Documentation/running.md

@@ -23,7 +23,7 @@ flanneld -subnet-file /vxlan.env -etcd-prefix=/vxlan/network
 
 1. Download a `flannel` binary.
 ```bash
-wget https://github.com/coreos/flannel/releases/download/v0.7.0/flanneld-amd64 && chmod +x flanneld-amd64
+wget https://github.com/coreos/flannel/releases/download/v0.7.1/flanneld-amd64 && chmod +x flanneld-amd64
 ```
 2. Run the binary.
 ```bash

+ 81 - 1
Documentation/troubleshooting.md

@@ -1,4 +1,84 @@
-### Firewalls
+# Troubleshooting
+
+# General
+
+## Logging
+Flannel uses the `glog` library but only supports logging to stderr. The severity level can't be changed but the verbosity can be changed with the `-v` option. Flannel does not make extensive use of the verbosity level but increasing th value from `0` (the default) will result in some additional logs. To get the most detailed logs, use `-v=10`
+
+```
+-v value
+    	log level for V logs
+-vmodule value
+    	comma-separated list of pattern=N settings for file-filtered logging
+-log_backtrace_at value
+    	when logging hits line file:N, emit a stack trace
+```
+
+When running under systemd (e.g. on CoreOS Container Linux) the logs can be viewed with `journalctl -u flanneld`
+
+When flannel is running as a pod on Kubernetes, the logs can be viewed with `kubectl logs --namespace kube-system <POD_ID> -c kube-flannel`. You can find the pod IDs with `kubectl get po --namespace kube-system -l app=flannel`
+
+## Interface selection and the public IP.
+Most backends require that each node has a unique "public IP" address. This address is chosen when flannel starts. Because leases are tied to the public address, if the address changes, flannel must be restarted.
+
+The interface chosen and the public IP in use is logged out during startup, e.g.
+```
+I0629 14:28:35.866793    5522 main.go:386] Determining IP address of default interface
+I0629 14:28:35.866987    5522 main.go:399] Using interface with name enp62s0u1u2 and address 172.24.17.174
+I0629 14:28:35.867000    5522 main.go:412] Using 10.10.10.10 as external address
+```
+
+### Vagrant
+Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address `10.0.2.15`, is for external traffic that gets NATed.
+
+This may lead to problems with flannel. By default, flannel selects the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this issue, pass the `--iface eth1` flag to flannel so that the second interface is chosen.
+
+## Permissions
+Depending on the backend being used, flannel may need to run with super user permissions. Examples include creating VXLAN devices or programming routes.  If you see errors similar to the following, confirm that the user running flannel has the right permissions (or try running with `sudo)`.
+ * `Error adding route...`
+ * `Add L2 failed`
+ * `Failed to set up IP Masquerade`
+ * `Error registering network: operation not permitted`
+
+## Performance
+
+### Control plane
+Flannel is known to scale to a very large number of hosts. A delay in contacting pods in a newly created host may indicate control plane problems. Flannel doesn't need much CPU or RAM but the first thing to check would be that it has adaquate resources available. Flannel is also reliant on the performance of the datastore, either etcd or the Kubernetes API server. Check that they are performing well.
+
+### Data plane
+Flannel relies on the underlying network so that's the first thing to check if you're seeing poor data plane performance.
+
+There are two flannel specific choices that can have a big impact on performance
+1) T1) The type of backend. For example, if encapsulation is used, `vxlan` will always perform better than `udp`. For maximum data plane performance, avoid encapsulation.
+2) The size of the MTU can have a large impact. To achieve maximum raw bandwidth, a network supporting a large MTU should be used. Flannel writes an MTU setting to the `subnet.env` file. This file is read by either the Docker daemon or the CNI flannel plugin which does the networking for individual containers. To troubleshoot, first ensure that the network interface that flannel is using has the right MTU. Then check that the correct MTU is written to the `subnet.env`. Finally, check that the containers have the correct MTU on their virtual ethernet device.
+
+
+## Firewalls
 When using `udp` backend, flannel uses UDP port 8285 for sending encapsulated packets.
+
 When using `vxlan` backend, kernel uses UDP port 8472 for sending encapsulated packets.
+
 Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.
+
+# Kubernetes Specific
+The flannel kube subnet manager relies on the fact that each node already has a `podCIDR` defined.
+
+You can check the podCidr for your nodes with one of the following two commands
+* `kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'`
+* `kubectl get nodes -o template --template={{.spec.podCIDR}}`
+
+If your nodes do not have a podCIDR, then either use the `--pod-cidr kubelet` command-line option or the `--allocate-node-cidrs=true --cluster-cidr=<cidr>` controller-manager command-line options.
+
+If `kubeadm` is being used then pass `--pod-network-cidr=10.244.0.0/16` to `kubeadm init` which will ensure that all nodes are automatically assigned a `podCIDR`.
+
+It's possible to manually set the `podCIDR` for each node.
+* `kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'`
+
+## Log messages
+
+* `failed to read net conf` - flannel expects to be able to read the net conf from "/etc/kube-flannel/net-conf.json". In the provided manifest, this is set up in the `kube-flannel-cfg` ConfigMap.
+* `error parsing subnet config` - The net conf is malformed. Double check that it has the right content and is valid JSON.
+* `node <NODE_NAME> pod cidr not assigned` - The node doesn't have a `podCIDR` defined. See above for more info.
+* `Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-abc123': the server does not allow access to the requested resource` - The kubernetes cluster has RBAC enabled. Run `https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml`
+
+

+ 18 - 21
README.md

@@ -4,40 +4,38 @@
 
 [![Build Status](https://travis-ci.org/coreos/flannel.png?branch=master)](https://travis-ci.org/coreos/flannel)
 
-Flannel is a virtual network that gives a subnet to each host for use with container runtimes.
-
-Platforms like Kubernetes assume that each container (pod) has a unique, routable IP inside the cluster. The advantage of this model is that it reduces the complexity of doing port mapping.
+Flannel is a simple and easy to configure layer 3 network fabric designed for Kubernetes.
 
 ## How it works
 
-Flannel runs an agent, `flanneld`, on each host and is responsible for allocating a subnet lease out of a preconfigured address space. Flannel uses either [etcd][etcd] or the Kubernetes API to store the network configuration, allocated subnets, and auxiliary data (such as host's IP). Packets are forwarded using one of several [backend mechanisms][backends].
+Flannel runs a small, single binary agent called `flanneld` on each host, and is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space.
+Flannel uses either the Kubernetes API or [etcd][etcd] directly to store the network configuration, the allocated subnets, and any auxiliary data (such as the host's public IP).
+Packets are forwarded using one of several [backend mechanisms][backends] including VXLAN and various cloud integrations.
 
-The following diagram demonstrates the path a packet takes as it traverses the overlay network:
+### Networking details
 
-![Life of a packet](./packet-01.png)
+Platforms like Kubernetes assume that each container (pod) has a unique, routable IP inside the cluster.
+The advantage of this model is that it removes the port mapping complexities that come from sharing a single host IP.
 
-## Getting started
+Flannel is responsible for providing a layer 3 IPv4 network between multiple nodes in a cluster. Flannel does not control how containers are networked to the host, only how the traffic is transported between hosts. However, flannel does provide a CNI plugin for Kubernetes and a guidance on integrating with Docker.
 
-The easiest way to deploy flannel with Kubernetes is to use one of several deployment tools and distributions that network clusters with flannel by default. CoreOS's [Tectonic][tectonic] sets up flannel in the Kubernetes clusters it creates using the open source [Tectonic Installer][tectonic-installer] to drive the setup process.
+Flannel is focused on networking. For network policy, other projects such as [Calico][calico] can be used.
 
-Flannel can use the Kubernetes API as its backing store, meaning there's no need to deploy a discrete `etcd` cluster for `flannel`. This `flannel` mode is known as the *kube subnet manager*.
+## Getting started on Kubernetes
 
-### Adding flannel
+The easiest way to deploy flannel with Kubernetes is to use one of several deployment tools and distributions that network clusters with flannel by default. For example, CoreOS's [Tectonic][tectonic] sets up flannel in the Kubernetes clusters it creates using the open source [Tectonic Installer][tectonic-installer] to drive the setup process.
 
-Flannel can be added to any existing Kubernetes cluster. It's simplest to add `flannel` before any pods using the pod network have been started.
+Though not required, it's recommended that flannel uses the Kubernetes API as its backing store which avoids the need to deploy a discrete `etcd` cluster for `flannel`. This `flannel` mode is known as the *kube subnet manager*.
 
-For information on deploying flannel manually, using the (currently alpha) Kubernetes installer toolkit kubeadm, see [Installing Kubernetes on Linux with kubeadm][installing-with-kubeadm].
+### Deploying flannel manually
 
-### Using flannel
+Flannel can be added to any existing Kubernetes cluster though it's simplest to add `flannel` before any pods using the pod network have been started.
 
-Once applied, the `flannel` manifest defines three things:
-1. A service account for `flannel` to use.
-2. A ConfigMap containing both a CNI configuration and a `flannel` configuration. The network in the `flannel` configuration should match the pod network CIDR. The choice of `backend` is also made here and defaults to VXLAN.
-3. A DaemonSet to deploy the `flannel` pod on each Node. The pod has two containers 1) the `flannel` daemon itself, and 2) a container for deploying the CNI configuration to a location that the `kubelet` can read.
+See [Kubernetes](Documentation/Kubernetes.md) for more details.
 
-When you run pods, they will be allocated IP addresses from the pod network CIDR. No matter which node those pods end up on, they will be able to communicate with each other.
+## Getting started on Docker
 
-Kubernetes 1.6 requires CNI plugin version 0.5.1 or later.
+flannel is also widely used outside of kubernetes. When deployed outside of kubernetes, etcd is always used as the datastore. For more details integrating flannel with Docker see [Running](Documentation/running.md)
 
 ## Documentation
 - [Building (and releasing)](Documentation/building.md)
@@ -68,8 +66,7 @@ See [reporting bugs][reporting] for details about reporting any issues.
 
 Flannel is under the Apache 2.0 license. See the [LICENSE][license] file for details.
 
-
-[kubeadm]: https://kubernetes.io/docs/getting-started-guides/kubeadm/
+[calico]: http://www.projectcalico.org
 [pod-cidr]: https://kubernetes.io/docs/admin/kubelet/
 [etcd]: https://github.com/coreos/etcd
 [contributing]: CONTRIBUTING.md