StatefulSet

StatefulSet is an object used to manage stateful applications. A StatefulSet manages the deployment and scaling of a set of Pods, and guarantees the ordering and uniqueness of those Pods.

Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of its Pods. These Pods are created from the same spec, but are not interchangeable — each has a persistent identifier that is maintained across any rescheduling.

Simple Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web-stateful-set
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-stateful-set
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi

The Service above is a Headless Service, used to control the network domain.

spec.serviceName: A required field. serviceName is the name of the Service that governs this StatefulSet, and is responsible for the network identity of the set. Pods will obtain their DNS/hostname following the format: pod-specific-string.serviceName.default.svc.cluster.local, where pod-specific-string is managed by the StatefulSet controller.

This manifest has been saved to GitHub: https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/stateful-set%20/nginx-stateful-set.yaml

How to Create / Update / Delete a StatefulSet

Create

Run the command:

1
kubectl apply -f https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/stateful-set%20/nginx-stateful-set.yaml

Result:
create

How Pods Are Managed

Unlike a Deployment, a StatefulSet manages Pods directly. Shown below is the metadata of web-stateful-set-0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
metadata:
creationTimestamp: "2020-02-20T12:52:31Z"
generateName: web-stateful-set-
labels:
app: nginx
controller-revision-hash: web-stateful-set-7579f5f4d9
statefulset.kubernetes.io/pod-name: web-stateful-set-0
name: web-stateful-set-0
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: web-stateful-set
uid: c509a152-6096-44a9-aea9-b4b9f5168c86
resourceVersion: "2219083"
selfLink: /api/v1/namespaces/default/pods/web-stateful-set-0
uid: 5da1a0a6-a952-40c5-96d5-f32048f4e5a6

Sticky Identity

The names of Pods created by a ReplicaSet consist of the ReplicaSet’s name plus a random string, for example web-server-jqxfq. A Deployment manages Pods through a ReplicaSet, so it follows the same naming convention.

A StatefulSet Pod has a unique identity composed of an ordinal, a stable network identity, and stable storage. The identity remains attached to the Pod regardless of which node it is (re)scheduled onto.

  • Its ordinal is an integer from 0 to N-1 (where N is the number of Pods).
  • Stable network identity. Each Pod in a StatefulSet derives its hostname from the StatefulSet’s name and the Pod’s ordinal. The pattern for the constructed hostname is $(StatefulSet name)-$(ordinal). The example above creates three Pods named web-stateful-set-0, web-stateful-set-1, and web-stateful-set-2.
  • Stable storage. Kubernetes creates one PersistentVolume for each VolumeClaimTemplate. When a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolumeClaims. When Pods or the StatefulSet are deleted, the PersistentVolumes associated with the Pods’ PersistentVolumeClaims are not deleted — this must be done manually.

Update

Upgrade the nginx version in the manifest above to 1.17. The updated manifest is at: https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/stateful-set%20/nginx-stateful-set-update.yaml

Run the command:

1
kubectl apply -f https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/stateful-set%20/nginx-stateful-set-update.yaml

Result:

update

Delete

Run the command:

1
kubectl delete -f https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/stateful-set%20/nginx-stateful-set-update.yaml

or:

1
kubectl delete sts web-stateful-set

Result:

delete

Deployment and Scaling Guarantees

The create / update / delete operations demonstrated above help illustrate the following rules.

  • For a StatefulSet with N replicas, when Pods are deployed they are created sequentially in order from {0..N-1}.

  • When Pods are deleted, they are terminated in reverse order from {N-1..0}.

  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready.

  • Before a Pod is terminated, all of its successors must be completely shut down.

When to Use

StatefulSets are useful for applications that require one or more of the following.

  • Stable, unique network identifiers.

  • Stable (persistent across Pod scheduling/rescheduling), persistent storage.

  • Ordered, graceful deployment and scaling.

  • Ordered, automated rolling updates.

Limitations

  • The storage for a given Pod must either be provisioned by a PersistentVolume provisioner based on the requested storage class, or pre-provisioned by an administrator.

  • Deleting or scaling down a StatefulSet will not delete the volumes associated with it. This is done to ensure data safety, which is generally more valuable than automatically purging all related StatefulSet resources.

  • StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods.

  • When a StatefulSet is deleted, it provides no guarantees about the termination of Pods. To achieve ordered and graceful termination of Pods in a StatefulSet, it is possible to scale the StatefulSet down to 0 before deletion.

  • If rolling updates are used with the default Pod management policy (OrderedReady), it is possible to get into a broken state that requires manual intervention to repair.