Kubenet Networking
How does kubenet work?
A good place to start is the official documentation and also this Calico Tutorial Video.
In the documentation it says that kubenet “does not, of itself, implement more advanced features like cross-node networking or network policy. It is typically used together with a cloud provider that sets up routing rules for communication between nodes, or in single-node environments.”
Of course, “cross-node networking” is critically important (if there is more than one node).
Let’s look at how kops implements a solution on AWS. kops uses kubenet as the default network plugin, and relies on basic kubernetes-native features for cross-node networking rather than additional complicated plugins.
(You are welcome to merely read the following steps, or actually run the commands yourself to reproduce the results. Either way is fine.)
1. Launch a kops cluster (change all parameters as necessary):
kops create cluster --name=kubernetes-cluster.example.com \ --state=s3://example-kops --zones=us-east-1a,us-east-1b \ --node-count=2 --node-size=t2.micro --master-size=t2.medium \ --dns-zone=example.com --dns-zone=S3M71Z8GRFKJ
2. SSH into the master node:
kubectl get nodes -o wide ssh admin@$IP
3. View the flags on kube-controller-manager:
ps -ef | grep kube-controller-manager # the results are: # root 4928 4911 1 19:43 ? 00:00:09 /usr/local/bin/kube-controller-manager --allocate-node-cidrs=true --cloud-provider=aws --cluster-cidr=100.96.0.0/11 ...
kube-controller-manager is running with –allocate-node-cidrs. It will allocate cidr’s for nodes from the address range specified as 100.96.0.0/11.
4. Let’s launch a new k8s node, and see what happens in the kube-controller-manager logs.
In the AWS autoscaling group, adjust the desired and maximum values, increasing them by one. A new node will launch.
Observe the logs:
kubectl get pods -n kube-system kubectl logs -f kube-controller-manager-ip-172-20-52-31.ec2.internal -n kube-system | tee test.txt
The results:
I0401 20:04:55.529607 1 range_allocator.go:253] Node ip-172-20-82-30.ec2.internal is already in a process of CIDR assignment. I0401 20:04:55.532245 1 ttl_controller.go:271] Changed ttl annotation for node ip-172-20-82-30.ec2.internal to 0 seconds I0401 20:04:55.546108 1 range_allocator.go:373] Set node ip-172-20-82-30.ec2.internal PodCIDR to [100.96.3.0/24] I0401 20:04:57.287907 1 node_lifecycle_controller.go:787] Controller observed a new Node: "ip-172-20-82-30.ec2.internal" I0401 20:04:57.287947 1 controller_utils.go:167] Recording Registered Node ip-172-20-82-30.ec2.internal in Controller event message for node ip-172-20-82-30.ec2.internal I0401 20:04:57.288172 1 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-172-20-82-30.ec2.internal", UID:"ac170374-50eb-4ce3-8a1d-7ee3ef94a127", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-172-20-82-30.ec2.internal event: Registered Node ip-172-20-82-30.ec2.internal in Controller W0401 20:04:57.288316 1 node_lifecycle_controller.go:1058] Missing timestamp for Node ip-172-20-82-30.ec2.internal. Assuming now as a timestamp. I0401 20:04:57.288354 1 node_lifecycle_controller.go:886] Node ip-172-20-82-30.ec2.internal is NotReady as of 2020-04-01 20:04:57.288344083 +0000 UTC m=+1312.605599849. Adding it to the Taint queue. I0401 20:04:58.073065 1 route_controller.go:193] Creating route for node ip-172-20-82-30.ec2.internal 100.96.3.0/24 with hint ac170374-50eb-4ce3-8a1d-7ee3ef94a127, throttled 975ns I0401 20:04:58.715747 1 route_controller.go:213] Created route for node ip-172-20-82-30.ec2.internal 100.96.3.0/24 with hint ac170374-50eb-4ce3-8a1d-7ee3ef94a127 after 642.678537ms I0401 20:04:58.715895 1 route_controller.go:303] Patching node status ip-172-20-82-30.ec2.internal with true previous condition was:nil I0401 20:05:07.296163 1 node_lifecycle_controller.go:910] Node ip-172-20-82-30.ec2.internal is healthy again, removing all taints
Notice the third line:
Set node PodCIDR to [100.96.3.0/24]
It has assigned a range of addresses for that node. New pods on that node will be assigned 100.96.3.0/24
And also this line:
Creating route for node ip-172-20-82-30.ec2.internal 100.96.3.0/24
The kube-controller manager has created a route in the AWS VPC routing table, which routes 100.96.3.0/24 to that server. You may view the new route in the AWS Console.
5. View the node resource in kubernetes:
kubectl get nodes kubectl edit node $NEW_NODENAME
It has this section:
spec: podCIDR: 100.96.3.0/24 podCIDRs: - 100.96.3.0/24
Kubenet will learn which IP range to assign pods from here.
6. SSH into the new node and view the flags on the kubelet.
kubectl get nodes -o wide ssh admin@$IP ps -ef | grep kubelet # results: # /usr/local/bin/kubelet ... --network-plugin=kubenet ...
The kubelet process does not have a flag determining the cidr range. It must be parsing the range provided by the kube-controller-manager, mentioned above. It is the kubelet’s job to launch pods on the nodes. At this stage, we have reached the paragraph from the docs:
Kubenet creates a Linux bridge named cbr0 and creates a veth pair for each pod with the host end of each pair connected to cbr0. The pod end of the pair is assigned an IP address allocated from a range assigned to the node either through configuration or by the controller-manager. cbr0 is assigned an MTU matching the smallest MTU of an enabled normal interface on the host.
Pod traffic is routed unencapsulated, with real IP addresses. The VPC routing table directs the packets to the correct node. However, if the outgoing destination is non-cluster (such as the internet), SNAT is applied.
I hope this was informative. If you have questions, or any suggestions to improve this post, please let me know.