Horizontal Pod Autoscaler

Horizontal Pod Autoscaler

Horizontal Pod Autoscaling automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.





This document walks you through an example of enabling Horizontal Pod Autoscaler for the php-apache server.

Prerequisites:

This example requires a running Kubernetes cluster and kubectl, version 1.2 or later. metrics-server monitoring needs to be deployed in the cluster to provide metrics via the resource metrics API, as Horizontal Pod Autoscaler uses this API to collect metrics.

To demonstrate Horizontal Pod Autoscaler it uses a custom docker image based on the php-apache image. The procedure is as follows:

1. Run & expose php-apache server

$ kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80

2. Create Horizontal Pod Autoscaler

Now that the server is running, we will create the autoscaler using kubectl autoscale. The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment we created in the first step of these instructions. Roughly speaking, HPA will increase and decrease the number of replicas (via the deployment) to maintain an average CPU utilization across all Pods of 50% (since each pod requests 200 milli-cores by kubectl run, this means average CPU usage of 100 milli-cores).

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

3. Increase load

Now, we will see how the autoscaler reacts to increased load. We will start a container, and send an infinite loop of queries to the php-apache service (please run it in a different terminal):

$ kubectl run -i --tty load-generator --image=busybox /bin/sh

Hit enter for command prompt

$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Within a minute or so, we should see the higher CPU load by executing (In another terminal):

$ kubectl get hpa

$ kubectl describe hpa

Gradually, the load will pile up and the number of pods will get auto-scaled. We then need to stop the load by 'Ctrl +C', after some time the pods will down-scale to the initial status.

4. Stop load

We will finish our example by stopping the user load.
In the terminal where we created the container with busybox image, terminate the load generation by typing <Ctrl> + C.
Then we will verify the result state (after a minute or so):

$ kubectl get hpa

$ kubectl get deployment php-apache

Here CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.


- by Anish Gupta
email - anish.techycardia@gmail.com

Comments

Popular Posts