This post assumes that you have basic understanding of Kubernetes terms like pods, deployments and nodes.
A Kubernetes cluster can have many nodes. Each node in turn can run multiple pods. By default Kubernetes manages which pod will run on which node and this is something we do not need to worry about it.
However sometimes we want to ensure that certain pods do not run on the same node. For example we have an application called wheel. We have both staging and production version of this app and we want to ensure that production pod and staging pod are not on the same host.
To ensure that certain pods do not run on the same host we can use nodeSelector constraint in PodSpec to schedule pods on nodes.
Kubernetes cluster
We will use kops to provision cluster. We can check the health of cluster using kops validate-cluster.
1$ kops validate cluster 2Using cluster from kubectl context: test-k8s.nodes-staging.com 3 4Validating cluster test-k8s.nodes-staging.com 5 6INSTANCE GROUPS 7NAME ROLE MACHINETYPE MIN MAX SUBNETS 8master-us-east-1a Master m4.large 1 1 us-east-1a 9master-us-east-1b Master m4.large 1 1 us-east-1b 10master-us-east-1c Master m4.large 1 1 us-east-1c 11nodes-wheel-stg Node m4.large 2 5 us-east-1a,us-east-1b 12nodes-wheel-prd Node m4.large 2 5 us-east-1a,us-east-1b 13 14NODE STATUS 15 NAME ROLE READY 16ip-192-10-110-59.ec2.internal master True 17ip-192-10-120-103.ec2.internal node True 18ip-192-10-42-9.ec2.internal master True 19ip-192-10-73-191.ec2.internal master True 20ip-192-10-82-66.ec2.internal node True 21ip-192-10-72-68.ec2.internal node True 22ip-192-10-182-70.ec2.internal node True 23 24Your cluster test-k8s.nodes-staging.com is ready
Here we can see that there are two instance groups for nodes: nodes-wheel-stg and nodes-wheel-prd.
nodes-wheel-stg might have application pods like pod-wheel-stg-sidekiq, pod-wheel-stg-unicorn and pod-wheel-stg-redis. Smilarly nodes-wheel-prd might have application pods like pod-wheel-prd-sidekiq, pod-wheel-prd-unicorn and pod-wheel-prd-redis.
As we can see the Max number of nodes for instance group nodes-wheel-stg and nodes-wheel-prd is 5. It means if new nodes are created in future then based on the instance group the newly created nodes will automatically be labelled and no manual work is required.
Labelling a Node
We will use kubernetes labels to label a node. To add a label we need to edit instance group using kops.
1$ kops edit ig nodes-wheel-stg
This will open up instance group configuration file, we will add following label in instance group spec.
1nodeLabels: 2 type: wheel-stg
Complete ig configuration looks like this.
1apiVersion: kops/v1alpha2 2kind: InstanceGroup 3metadata: 4 creationTimestamp: 2017-10-12T06:24:53Z 5 labels: 6 kops.k8s.io/cluster: k8s.nodes-staging.com 7 name: nodes-wheel-stg 8spec: 9 image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 10 machineType: m4.large 11 maxSize: 5 12 minSize: 2 13 nodeLabels: 14 type: wheel-stg 15 role: Node 16 subnets: 17 - us-east-1a 18 - us-east-1b 19 - us-east-1c
Similarly, we can label for instance group nodes-wheel-prod with label type wheel-prod.
After making the changes update cluster using kops rolling update cluster --yes --force. This will update the cluster with specified labels.
New nodes added in future will have labels based on respective instance groups.
Once nodes are labeled we can verify using kubectl describe node.
1$ kubectl describe node ip-192-10-82-66.ec2.internal 2Name: ip-192-10-82-66.ec2.internal 3Roles: node 4Labels: beta.kubernetes.io/arch=amd64 5 beta.kubernetes.io/instance-type=m4.large 6 beta.kubernetes.io/os=linux 7 failure-domain.beta.kubernetes.io/region=us-east-1 8 failure-domain.beta.kubernetes.io/zone=us-east-1a 9 kubernetes.io/hostname=ip-192-10-82-66.ec2.internal 10 kubernetes.io/role=node 11 type=wheel-stg
In this way we have our node labeled using kops.
Labelling nodes using kubectl
We can also label node using kubectl.
1$ kubectl label node ip-192-20-44-136.ec2.internal type=wheel-stg
After labeling a node, we will add nodeSelector field to our PodSpec in deployment template.
We will add the following block in deployment manifest.
1nodeSelector: 2 type: wheel-stg
We can add this configuration in original deployment manifest.
1apiVersion: v1 2kind: Deployment 3metadata: 4 name: test-staging-node 5 labels: 6 app: test-staging 7 namespace: test 8spec: 9 replicas: 1 10 template: 11 metadata: 12 labels: 13 app: test-staging 14 spec: 15 containers: 16 - image: <your-repo>/<your-image-name>:latest 17 name: test-staging 18 imagePullPolicy: Always 19 - name: REDIS_HOST 20 value: test-staging-redis 21 - name: APP_ENV 22 value: staging 23 - name: CLIENT 24 value: test 25 ports: 26 - containerPort: 80 27 nodeSelector: 28 type: wheel-stg 29 imagePullSecrets: 30 - name: registrykey
Let's launch this deployment and check where the pod is scheduled.
1$ kubectl apply -f test-deployment.yml 2deployment "test-staging-node" created
We can verify that our pod is running on node type=wheel-stg.
1kubectl describe pod test-staging-2751555626-9sd4m 2Name: test-staging-2751555626-9sd4m 3Namespace: default 4Node: ip-192-10-82-66.ec2.internal/192.10.82.66 5... 6... 7Conditions: 8 Type Status 9 Initialized True 10 Ready True 11 PodScheduled True 12QoS Class: Burstable 13Node-Selectors: type=wheel-stg 14Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s 15 node.alpha.kubernetes.io/unreachable:NoExecute for 300s 16Events: <none>
Similarly we run nodes-wheel-prod pods on nodes labeled with type: wheel-prod.
Please note that when we specify nodeSelector and no node matches label then pods are in pending state as they don't find node with matching label.
In this way we schedule our pods to run on specific nodes for certain use-cases.