Karpenter - A New Way to Manage Kubernetes Node Groups

One of the most common discussions that happen when adopting Kubernetes is around autoscaling. You can autoscale your workloads horizontally or vertically, but the main challenge has always been the nodes.

The hypervisor doesn’t have visibility into what the container is actually consuming in a virtual machine, nor is it aware of the workload resource requirements, and without that information the cloud provider can’t reliably handle the node autoscaling. The solution was to let something that does have that information handle it, and so we have the Cluster Autoscaler.

The Cluster Autoscaler automatically adjusts the size of an autoscaling group (ASG), when a pod failed to run in the cluster due to insufficient resources, or when nodes in the cluster are underutilized for a set period of time, and their pods can fit into other existing nodes.

Looking at the above description, it seems like the Cluster Autoscaler is just fine, and in most cases it is, but what if you need a new type of node that isn’t available yet in your cluster’s nodegroups?

Most organizations will have their clusters deployed using some kind of infrastructure as code tool like Terraform or AWS Cloudformation, which means that updates to this codebase will be necessary when changing the node groups. Configuring details and restrictions of these node groups is not always a straightforward process either.

New nodes can also take a while to be available to Kubernetes, and once they are available you might still run into racing conditions scheduling pods into these nodes.

Recently, AWS released Karpenter to address these issues and bring a more native approach to managing your cluster nodes.

Let’s take a look at how both solutions work, current pros and cons.

Cluster Autoscaler and Karpenter

How does the Cluster Autoscaler work?

We deploy a workload to the cluster
Kubernetes scheduler could not find a node that will fit our pod
Pod is marked as

Pending

and

Unschedulable
Cluster Autoscaler looks for pods in a

Pending

state
It increases the ASG desired count if the pending pods do not fit in the current nodes
The ASG creates a new instance
Instance joins the cluster
Kubernetes scheduler finds the new node and, if the pod fits in it, assigns the pod to it

So the Cluster Autoscaler doesn’t really deal with the nodes themselves, it just adjusts the AWS ASG and lets AWS take care of everything else on the infrastructure side, and relies on the Kubernetes scheduler to assign the pod to a node.

While this works, it can introduce a number of failure modes, like a racing condition having a pod being assigned to your new node before your old pod, triggering the whole loop again and leaving your pod pending for a longer period.

What about Karpenter?

Karpenter does not manipulate ASGs, it handles the instances directly. Instead of creating code to deploy a new node group, then target your workload to that group, you just deploy your workload, and Karpenter will create an EC2 instance that matches your constraints, if it has a matching Provisioner. A Provisioner in Karpenter is a manifest that describes a node group. You can have multiple Provisioners for different needs, just like node groups.

Ok, if its like node groups, what is the advantage? The catch is in the way that Karpenter works. Let’s do the same exercise we did for the Cluster Autoscaler, but now with Karpenter.

We deploy a workload to the cluster
Kubernetes scheduler could not find a node that will fit our pod
Pod is marked as

Pending

and

Unschedulable
Karpenter evaluates the resources and constraints of the

Unschedulable

pods against the available Provisioners and creates matching EC2 instances
Instance(s) joins the cluster
Karpenter immediately binds the pods to the new node(s) without waiting for the Kubernetes scheduler

Just by not relying on ASGs and handling the nodes itself, it cuts on the time needed to provision a new node, as it doesn’t need to wait for the ASG to respond to a change in its sizing, it can request a new instance in seconds.

In our tests, a pending pod got a node created for it in 2 seconds, and was running in about 1 minute in average, versus 2 to 5 minutes with the Cluster Autoscaler.

The possible racing condition we talked about before, is not possible in this model as the pods are immediately assigned to the new nodes.

Other interesting things the Provisioner can do is setting a ttl for empty nodes, so a node that has no pods, other than DaemonSet pods, is terminated when the ttl is reached.

It can also ensure nodes are current by enforcing a ttl for the nodes in general, meaning a node is recycled once the ttl is reached.

Ok! So Karpenter is great, let’s dump the Cluster Autoscaler! Not so fast! There is one feature that Karpenter is missing from Cluster Autoscaler, which is rebalancing nodes, the later can drain a node when its utilization falls under a certain treshold and its pods fit in other nodes.

Talk is Cheap! Show me the demo!

Let’s get this running! We’re following the getting started guide from karpenter.sh with a couple twists.

At the time this post was written Karpenter 0.5.2 was the latest version available.

First the good old warning for all demo code.

WARNING! This code is for use in testing only, broad permissions are given to Karpenter, and there was no effort in securing the cluster.

Now go and checkout our git repository from https://github.com/ops-guru/karpenter-blog-post.

We will use Terraform, and Helm to deploy:

a VPC and Subnets
an EKS cluster with one node (need to run Karpenter somewhere right?)
an IAM role to allow Karpenter to manipulate some AWS resources it needs to manage nodes for us (more details on those in the

Getting Started with Terraform

page at Karpenter’s website)
Karpenter using its helm chart with access to its IAM role through

IAM Roles for Service Accounts

To that end we will first export a couple environment variables.

AWS_PROFILE

is our AWS cli profile configured with our credentials (if yours are in your default profile you can skip this one)
AWS_DEFAULT_REGION

to select which region to create resources in
CLUSTER_NAME

to give our cluster a nice name
KUBECONFIG

and

KUBE_CONFIG_PATH

to tell kubectl, helm and terraform where our kubeconfig file is (which will be created by terraform for us)

export AWS_PROFILE=opsguru export AWS_DEFAULT_REGION=ca-central-1 export CLUSTER_NAME=opsguru-karpenter-test export KUBECONFIG=${PWD}/kubeconfig_${CLUSTER_NAME} export KUBE_CONFIG_PATH=${KUBECONFIG}

Let’s create our cluster and deploy Karpenter into it. Init terraform, then check the plan and confirm. EKS cluster creation takes around 10 minutes.

terraform init terraform apply -var cluster_name=${CLUSTER_NAME} -var region=${AWS_DEFAULT_REGION}

Now that you’ve got some coffee let’s talk node groups.

Our demo will assume we want two node groups in the cluster. One using on-demand instances, another using spot instances.

How can we do this with Karpenter? We just need to define Provisioners for each of these groups. Instead of rambling about it, let’s take a look at the provisioner resources for our two node groups.

Our on-demand instances are for our cluster addons, we will want a taint to ensure only cluster addons are deployed there. We also want to restrict the node types to m5.large and m5.2xlarge instances in both our availability zones.

cat <<EOF > node_group_addons.yaml apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: addons-ondemand spec: requirements: - key: node.kubernetes.io/instance-type # If not included, all instance types are considered operator: In values: ["m5.large", "m5.2xlarge"] - key: "topology.kubernetes.io/zone" # If not included, all zones are considered operator: In values: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}b"] - key: "karpenter.sh/capacity-type" operator: In values: ["on-demand"] labels: # Kubernetes labels managed-by: karpenter purpose: addons provider: instanceProfile: KarpenterNodeInstanceProfile-${CLUSTER_NAME} tags: # AWS EC2 Tags managed-by: karpenter ttlSecondsAfterEmpty: 30 # If a node is empty of non daemonset pods for this ttl, it is removed taints: - key: opsguru.com/addons effect: NoSchedule EOF

What are we looking at?

Any pods that nodeSelect to

managed-by: karpenter

and tolerates our

opsguru.com/addons

taint, and can fit into a m5.large or m5.2xlarge node will have a node provisioned for it, if needed
The nodes will be on-demand type nodes
The nodes will be deployed in either our AZ a or b
If the node is empty for more than 30 seconds we terminate it
Kubernetes labels

managed-by: karpenter

and

purpose: addons

will be added to the nodes
An EC2 tag

managed-by: karpenter

will be applied to the nodes

Our spot instances are for any other workloads, we will not taint them and we will use c5 instances. Any workloads that can’t fit on our initial cluster node (the one created with Terraform), and do not tolerate the opsguru.com/addons from the on-demand group, should be scheduled in these nodes.

cat <<EOF > node_group_general_spot.yaml apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: spot-general spec: requirements: - key: node.kubernetes.io/instance-type # If not included, all instance types are considered operator: In values: ["c5.large", "c5.2xlarge"] - key: "topology.kubernetes.io/zone" # If not included, all zones are considered operator: In values: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}b"] - key: "karpenter.sh/capacity-type" operator: In values: ["spot"] labels: # Kubernetes labels managed-by: karpenter provider: instanceProfile: KarpenterNodeInstanceProfile-${CLUSTER_NAME} tags: # AWS EC2 Tags managed-by: karpenter ttlSecondsAfterEmpty: 30 # If a node is empty of non daemonset pods for this ttl, it is removed EOF

This one is quite similar to the first Provisioner, but we’re using spot instances instead of on-demand, c5 type nodes, and no taint.

Now that we have our provisioners defined, let’s install Karpenter using Helm.

helm repo add karpenter https://charts.karpenter.sh helm repo update helm install karpenter -n karpenter --create-namespace --version 0.5.2 --set serviceAccount.annotations.eks\.amazonaws\.com/role-arn=$(terraform output -raw iam_role_arn) --set controller.clusterName=${CLUSTER_NAME} --set controller.clusterEndpoint=$(terraform output -raw cluster_endpoint) --wait karpenter/karpenter

Ok! We got almost everything we need to see this working, we’re just missing one little thing, actual workloads 😀

You can apply the workloads folder from our git repository, we have two manifests there:

addon.yaml – a deployment of a pause container, with a nodeSelector to the label

purpose: addons
, and tolerating the taint defined in the provisioner, with 1 replica
general.yaml – a deployment of a pause container, with a nodeSelector to the label

managed-by: karpenter
, with 20 replicas

kubectl get pods -o=custom-columns="NAME:.metadata.name,STATUS:.status.conditions[*].reason,MESSAGE:.status.conditions[*].message,NODE:.spec.nodeName" NAME STATUS MESSAGE NODE addon-7fc784b5d-fg2dx Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> general-workloads-5df49fcb-2hhqg Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> general-workloads-5df49fcb-4mlqt Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> general-workloads-5df49fcb-4zx4v Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> general-workloads-5df49fcb-5788h Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> general-workloads-5df49fcb-7b76r Unschedulable 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. <none> ...

With these deployed you can see that all 11 pods are pending, and their status says they’re Unschedulable, the one node we have in the cluster does not match their constraints (nodeSelector), and that they have no node assigned.

Let’s check the status of our nodes:

kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-0-1-47.ca-central-1.compute.internal Ready <none> 46h v1.21.5-eks-bc4871b kubectl describe node ip-10-0-1-47.ca-central-1.compute.internal Name: ip-10-0-1-47.ca-central-1.compute.internal Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.large beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=ca-central-1 failure-domain.beta.kubernetes.io/zone=ca-central-1a kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-1-47.ca-central-1.compute.internal kubernetes.io/os=linux node.kubernetes.io/instance-type=m5.large topology.kubernetes.io/region=ca-central-1 topology.kubernetes.io/zone=ca-central-1a ...

The existing node indeed doesn’t have the labels we’re trying to use for our nodeSelector in any of our workloads.

Now let’s deploy our first provisioner addons-ondemand.

kubectl apply -f node_group_addons.yaml provisioner.karpenter.sh/addons-ondemand created

If you’re following the Karpenter controller logs you will see a node be provisioned and the pod bound to it immediately.

kubectl logs -n karpenter -l karpenter=controller -f 2021-12-17T18:49:33.800Z INFO controller.provisioning Batched 1 pods in 1.000321584s {"commit": "870e2f6", "provisioner": "addons-ondemand"} 2021-12-17T18:49:33.804Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [m5.large m5.2xlarge] {"commit": "870e2f6", "provisioner": "addons-ondemand"} 2021-12-17T18:49:36.061Z INFO controller.provisioning Launched instance: i-03ffbc75bd75a68e7, hostname: ip-10-0-1-114.ca-central-1.compute.internal, type: m5.large, zone: ca-central-1a, capacityType: on-demand {"commit": "870e2f6", "provisioner": "addons-ondemand"} 2021-12-17T18:49:36.098Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-0-1-114.ca-central-1.compute.internal {"commit": "870e2f6", "provisioner": "addons-ondemand"}

If you check our pods again, you will see that its scheduled to a node.

kubectl get pods -o=custom-columns="NAME:.metadata.name,STATUS:.status.conditions[*].reason,MESSAGE:.status.conditions[*].message,NODE:.spec.nodeName" NAME STATUS MESSAGE NODE addon-7fc784b5d-fg2dx <none> <none> ip-10-0-1-114.ca-central-1.compute.internal general-workloads-5df49fcb-2hhqg Unschedulable 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {opsguru.com/addons: }, that the pod didn't tolerate. <none> general-workloads-5df49fcb-4mlqt Unschedulable 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {opsguru.com/addons: }, that the pod didn't tolerate. <none> general-workloads-5df49fcb-4zx4v Unschedulable 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {opsguru.com/addons: }, that the pod didn't tolerate. <none> ...

You will also notice that our general workloads are still Unschedulable, but that the message says that now 2 nodes don’t match, one doesn’t match the selector, the other has a taint the workload doesn’t tolerate.

Let’s see our nodes now.

kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-0-1-114.ca-central-1.compute.internal Ready <none> 11m v1.21.5-eks-bc4871b ip-10-0-1-47.ca-central-1.compute.internal Ready <none> 46h v1.21.5-eks-bc4871b

There is our new node! Let’s see what Karpenter got us.

kubectl describe node ip-10-0-1-114.ca-central-1.compute.internal Name: ip-10-0-1-114.ca-central-1.compute.internal Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.large beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=ca-central-1 failure-domain.beta.kubernetes.io/zone=ca-central-1a karpenter.sh/capacity-type=on-demand karpenter.sh/provisioner-name=addons-ondemand kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-1-114.ca-central-1.compute.internal kubernetes.io/os=linux managed-by=karpenter node.kubernetes.io/instance-type=m5.large purpose=addons topology.kubernetes.io/region=ca-central-1 topology.kubernetes.io/zone=ca-central-1a Annotations: node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Fri, 17 Dec 2021 11:49:36 -0700 Taints: opsguru.com/addons:NoSchedule ... Capacity: attachable-volumes-aws-ebs: 25 cpu: 2 ephemeral-storage: 20959212Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7934464Ki pods: 29 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 1930m ephemeral-storage: 18242267924 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7244288Ki pods: 29 ...

Our addon requires 1 core and 100Mb of memory, it has a nodeSelector pointing to the label purpose with value addons and tolerates the opsguru.com/addons taint.

Our Provisioner addons-ondemand matches all these conditions, and in its instance type options we have m5.large that can fit our pod (you can see that the node has 1930m allocatable, our pod needs 1000m). Since the request matches a Provisioner’s settings, we got a node for the workload.

What about our other pods? Well, let’s get their Provisioner up!

kubectl apply -f node_group_general_spot.yaml provisioner.karpenter.sh/spot-general created

Once we apply the Provisioner you will see in Karpenter’s logs:

2021-12-17T21:53:22.009Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:34.896Z INFO controller.provisioning Batched 20 pods in 1.410203663s {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:34.906Z INFO controller.provisioning Computed packing of 3 node(s) for 20 pod(s) with instance type option(s) [c5.2xlarge] {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.533Z INFO controller.provisioning Launched instance: i-082db60871ae40c9d, hostname: ip-10-0-1-162.ca-central-1.compute.internal, type: c5.2xlarge, zone: ca-central-1a, capacityType: spot {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.533Z INFO controller.provisioning Launched instance: i-03d7f3f1d4bffdea4, hostname: ip-10-0-2-46.ca-central-1.compute.internal, type: c5.2xlarge, zone: ca-central-1b, capacityType: spot {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.533Z INFO controller.provisioning Launched instance: i-09dc16d84a292604c, hostname: ip-10-0-2-169.ca-central-1.compute.internal, type: c5.2xlarge, zone: ca-central-1b, capacityType: spot {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.591Z INFO controller.provisioning Bound 7 pod(s) to node ip-10-0-1-162.ca-central-1.compute.internal {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.666Z INFO controller.provisioning Bound 7 pod(s) to node ip-10-0-2-46.ca-central-1.compute.internal {"commit": "870e2f6", "provisioner": "spot-general"} 2021-12-17T21:53:38.830Z INFO controller.provisioning Bound 6 pod(s) to node ip-10-0-2-169.ca-central-1.compute.internal {"commit": "870e2f6", "provisioner": "spot-general"}

Our 20 pods were split in 3 nodes, we can confirm that they are all scheduled by retrying our previous command to check their status:

kubectl get pods -o=custom-columns="NAME:.metadata.name,S NAME STATUS MESSAGE NODE addon-7fc784b5d-fg2dx <none> <none> general-workloads-5df49fcb-7f2mf <none> <none> general-workloads-5df49fcb-7rls5 <none> <none> general-workloads-5df49fcb-9qs99 <none> <none> general-workloads-5df49fcb-bqnvc <none> <none> general-workloads-5df49fcb-d775z <none> <none> general-workloads-5df49fcb-g5kdd <none> <none> general-workloads-5df49fcb-gxkn9 <none> <none> general-workloads-5df49fcb-jhq85 <none> <none> general-workloads-5df49fcb-jvnhl <none> <none> general-workloads-5df49fcb-nfhq5 <none> <none> general-workloads-5df49fcb-qpkdb <none> <none> general-workloads-5df49fcb-scmdp <none> <none> general-workloads-5df49fcb-tgtct <none> <none> general-workloads-5df49fcb-ts4pt <none> <none> general-workloads-5df49fcb-v6cql <none> <none> general-workloads-5df49fcb-wqhtl <none> <none> general-workloads-5df49fcb-xpw52 <none> <none> general-workloads-5df49fcb-xzgkq <none> <none> general-workloads-5df49fcb-z47dd <none> <none> general-workloads-5df49fcb-zpd6s <none> <none> TATUS:.status.conditions[*].reason,MESSAGE:.status.conditions[*].message,NODE:.spec.nodeName" ip-10-0-1-114.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-1-162.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-169.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal ip-10-0-2-46.ca-central-1.compute.internal

We should now have 5 nodes, 1 original node from Terraform, 1 from our addons-ondemand provisioner, 3 from the spot-general provisioner.

kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-0-1-114.ca-central-1.compute.internal Ready <none> 3h7m v1.21.5-eks-bc4871b ip-10-0-1-162.ca-central-1.compute.internal Ready <none> 3m57s v1.21.5-eks-bc4871b ip-10-0-1-47.ca-central-1.compute.internal Ready <none> 2d1h v1.21.5-eks-bc4871b ip-10-0-2-169.ca-central-1.compute.internal Ready <none> 3m57s v1.21.5-eks-bc4871b ip-10-0-2-46.ca-central-1.compute.internal Ready <none> 3m57s v1.21.5-eks-bc4871b

Let’s dig a bit into our new nodes now, which instance types we have now?

kubectl get nodes -l karpenter.sh/provisioner-name=spot-general -o jsonpath='{.items[*].metadata.labels.node.kubernetes.io/instance-type}' c5.2xlarge c5.2xlarge c5.2xlarge

Our general-workloads deployment pods only differ from the addon deployment for their nodeSelector and the lack of toleration for the opsguru.com/addons taint. Their nodeSelector label is set to managed-by: karpenter which also matches the addons-ondemand provisioner, but without the taint they can only match with the new Provisioner.

With the Provisioner matched, Karpenter now needs to decide which instance type to use between c5.large and c5.2xlarge. A c5.large has 2vCPUs and 4Gb of memory, so it should only be able to take one of our pods (2vCPUs should have ~1900m allocatable, we need 1000m per pod), this would require us to have one instance per pod, quite a lot of resource waste in there (almost half the instance would sit unused).

Now a c5.2xlarge has 8vCPUs and 16Gb of memory, which should fit 7 of our pods in each instance (8vCPUs should have ~7900m allocatable). This matches what we’re seeing, 3 nodes, 7 pods in 1 instance, 7 pods in another instance, 6 pods in the last instance, 20 pods scheduled in the best way allowed by our provisioner.

Cleanup

Thanks for coming to our TED TALK! errr, quick review of Karpenter.

Now let’s cleanup and see one more feature of Karpenter.

In both our Provisioners we have a setting ttlSecondsAfterEmpty: 30, which means that if a node has no pods (other than DaemonSet pods) for more than 30 seconds, it will be terminated.

We won’t take their word for it, let’s check it!

Let’s delete our deployments:

kubectl delete deployment general-workloads addon deployment.apps "general-workloads" deleted deployment.apps "addon" deleted

In Karpenter’s logs we can see the nodes getting a ttl and then being cordoned, drained and terminated.

2021-12-17T22:29:23.877Z INFO controller.node Added TTL to empty node {"commit": "870e2f6", "node": "ip-10-0-1-162.ca-central-1.compute.internal"} 2021-12-17T22:29:23.932Z INFO controller.node Added TTL to empty node {"commit": "870e2f6", "node": "ip-10-0-2-46.ca-central-1.compute.internal"} 2021-12-17T22:29:24.031Z INFO controller.node Added TTL to empty node {"commit": "870e2f6", "node": "ip-10-0-2-169.ca-central-1.compute.internal"} 2021-12-17T22:29:24.239Z INFO controller.node Added TTL to empty node {"commit": "870e2f6", "node": "ip-10-0-1-114.ca-central-1.compute.internal"} 2021-12-17T22:29:53.889Z INFO controller.node Triggering termination after 30s for empty node {"commit": "870e2f6", "node": "ip-10-0-1-162.ca-central-1.compute.internal"} 2021-12-17T22:29:53.915Z INFO controller.termination Cordoned node {"commit": "870e2f6", "node": "ip-10-0-1-162.ca-central-1.compute.internal"} 2021-12-17T22:29:53.948Z INFO controller.node Triggering termination after 30s for empty node {"commit": "870e2f6", "node": "ip-10-0-2-46.ca-central-1.compute.internal"} 2021-12-17T22:29:53.970Z INFO controller.termination Cordoned node {"commit": "870e2f6", "node": "ip-10-0-2-46.ca-central-1.compute.internal"} 2021-12-17T22:29:54.042Z INFO controller.node Triggering termination after 30s for empty node {"commit": "870e2f6", "node": "ip-10-0-2-169.ca-central-1.compute.internal"} 2021-12-17T22:29:54.068Z INFO controller.termination Cordoned node {"commit": "870e2f6", "node": "ip-10-0-2-169.ca-central-1.compute.internal"} 2021-12-17T22:29:54.070Z INFO controller.termination Deleted node {"commit": "870e2f6", "node": "ip-10-0-1-162.ca-central-1.compute.internal"} 2021-12-17T22:29:54.147Z INFO controller.termination Deleted node {"commit": "870e2f6", "node": "ip-10-0-2-46.ca-central-1.compute.internal"} 2021-12-17T22:29:54.247Z INFO controller.termination Deleted node {"commit": "870e2f6", "node": "ip-10-0-2-169.ca-central-1.compute.internal"} 2021-12-17T22:29:54.261Z INFO controller.node Triggering termination after 30s for empty node {"commit": "870e2f6", "node": "ip-10-0-1-114.ca-central-1.compute.internal"} 2021-12-17T22:29:54.290Z INFO controller.termination Cordoned node {"commit": "870e2f6", "node": "ip-10-0-1-114.ca-central-1.compute.internal"} 2021-12-17T22:29:54.425Z INFO controller.termination Deleted node {"commit": "870e2f6", "node": "ip-10-0-1-114.ca-central-1.compute.internal"}

Without workloads and nodes, we are left with our initial cluster, which Terraform will gladly destroy.

terraform destroy -var cluster_name=${CLUSTER_NAME} -var region=${AWS_DEFAULT_REGION}

Conclusion

Pros

This same demo in Cluster Autoscaler should be marginally slower (a couple minutes difference, which depending on your workloads might be crucial, or not), but at a larger scale (think several services with hundreds of pods each) this speed difference by itself is a major advantage.

Depending on how you manage tenancy in your clusters, you could even have the Provisioner deployed as part of your application through a helm chart, or just have an easier time managing node groups in general.

Cons

Karpenter still doesn’t have a mechanism for removing underutilized nodes if their workloads can fit elsewhere, which is a feature present in the Cluster Autoscaler. This could possibly be handled by Descheduler but that can be a whole other blog post 🙂

Cluster Autoscaler has been around for a good while, and is beyond battle tested, while Karpenter is relatively new and might be rough around the edges.

Karpenter only works on AWS right now, it can be expanded for other cloud providers though.

Final Thoughts

Karpenter is extremely promising, and its pros will outweight the cons in most cases. It is not an all or nothing solution either, you can have it running in parallel to Cluster Autoscaler and have the best of both worlds.

There is a lot we didn’t cover in here about Karpenter, take a look at our Related Links section at the bottom for some documentation and videos on it.

We are looking forward to how this tool develops going forward!

Written by:

Fernando Battistella, Principal Architect at OpsGuru – Fernando has over two decades of experience in IT, with the last six years architecting cloud-native solutions for companies of all sizes. Specialized in Kubernetes and the Cloud Native ecosystem, he has helped multiple organizations design, build, migrate, operate and train their teams in cloud-native technologies and platforms.

Solutions

AI

Partners

Industries

Insights

About

Solutions

AI

Partners

Industries

Insights

About

Karpenter - A New Way to Manage Kubernetes Node Groups

Cluster Autoscaler and Karpenter

Talk is Cheap! Show me the demo!

Cleanup

Conclusion

Pros

Cons

Final Thoughts

Related Links

Written by:

Contact Us

Contact Us

Karpenter - A New Way to Manage Kubernetes Node Groups

Cluster Autoscaler and Karpenter

Talk is Cheap! Show me the demo!

Cleanup

Conclusion

Pros

Cons

Final Thoughts

Related Links

Written by:

Contact Us