ChaosToolKit Cluster Level Pod Delete Experiment Details in kube-system
Experiment Metadata
Type | Description | Tested K8s Platform |
---|---|---|
ChaosToolKit | ChaosToolKit Cluster Level Pod delete experiment | Kubeadm, Minikube |
Prerequisites
- Ensure that the Litmus Chaos Operator is running by executing
kubectl get pods
in operator namespace (typically,litmus
). If not, install from here - Ensure that the
k8-pod-delete
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the desired namespace. If not, install from here - Ensure you have nginx default application setup on default namespace ( if you are using specific namespace please execute beloe on that namespace)
Entry Criteria
- Application replicas are healthy before chaos injection
- Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point
- This application we are executing against kube-system type namespace
Exit Criteria
- Application replicas are healthy after chaos injection
- Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point
Details
- Causes graceful pod failure of an ChaosToolKit replicas bases on provided namespace and Label with endpoint
- Tests deployment sanity check with Steady state hypothesis pre and post pod failures
- Service resolution will failed if Application replicas are not present.
Use Cases for executing the experiment
Type | Experiment | Details | json |
---|---|---|---|
ChaosToolKit | ChaosToolKit single, random pod delete experiment with count | Executing via label name app=<> | pod-app-kill-count.json |
ChaosToolKit | ChaosToolKit single, random pod delete experiment | Executing via label name app=<> | pod-app-kill-health.json |
ChaosToolKit | ChaosToolKit single, random pod delete experiment with count | Executing via Custom label name |
pod-app-kill-count.json |
ChaosToolKit | ChaosToolKit single, random pod delete experiment | Executing via Custom label name |
pod-app-kill-health.json |
ChaosToolKit | ChaosToolKit All pod delete experiment with health validation | Executing via Custom label name app=<> | pod-app-kill-all.json |
ChaosToolKit | ChaosToolKit All pod delete experiment with health validation | Executing via Custom label name |
pod-custom-kill-all.json |
Integrations
- Pod failures can be effected using one of these chaos libraries:
litmus
Steps to Execute the Chaos Experiment
This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started
Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.
Prepare chaosServiceAccount
- Based on your use case pick one of the choice from here
https://github.com/sumitnagal/chaos-charts/tree/testing/charts/chaostoolkit/k8-pod-delete
- Service owner use case
- Install the rbac for cluster in namespace from where you are executing the experiments
kubectl apply Cluster/rbac-admin.yaml
- Install the rbac for cluster in namespace from where you are executing the experiments
- Service owner use case
Sample Rbac Manifest for Service Owner use case
apiVersion: v1
kind: ServiceAccount
metadata:
name: chaos-admin
labels:
name: chaos-admin
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: chaos-admin
labels:
name: chaos-admin
rules:
- apiGroups: ["","apps","batch","extensions","litmuschaos.io","openebs.io","storage.k8s.io"]
resources: ["chaosengines","chaosexperiments","chaosresults","configmaps","cstorpools","cstorvolumereplicas","events","jobs","persistentvolumeclaims","persistentvolumes","pods","pods/exec","pods/log","secrets","storageclasses","chaosengines","chaosexperiments","chaosresults","configmaps","cstorpools","cstorvolumereplicas","daemonsets","deployments","events","jobs","persistentvolumeclaims","persistentvolumes","pods","pods/eviction","pods/exec","pods/log","replicasets","secrets","services","statefulsets","storageclasses"]
verbs: ["create","delete","get","list","patch","update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get","list","patch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: chaos-admin
labels:
name: chaos-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: chaos-admin
subjects:
- kind: ServiceAccount
name: chaos-admin
namespace: default
Prepare ChaosEngine
Provide the application info in
spec.appinfo
- It will be default as
appinfo: appns: default applabel: 'app=kiam' appkind: deployment
- It will be default as
Override the experiment tunables if desired in
experiments.spec.components.env
To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts
Supported Experiment Tunables
Variables | Description | Specify In ChaosEngine | Notes |
---|---|---|---|
NAME_SPACE | This is chaos namespace which will create all infra chaos resources in that namespace | Mandatory | Default to kube-system |
LABEL_NAME | The default name of the label | Mandatory | Defaults to kiam |
APP_ENDPOINT | Endpoint where chaostoolkit will make a call and ensure the application endpoint is healthy | Mandatory | Defaults to localhost |
FILE | Type of chaos experiments we want to execute | Mandatory | Default to `pod-app-kill-health.json` |
REPORT | The Report of execution coming in json format | Optional | Defaults to is `true` |
REPORT_ENDPOINT | Report endpoint which can take the json format and submit it | Optional | Default to setup for Kafka topic for chaos, but can support any reporting database |
Sample ChaosEngine Manifest
# chaosengine.yaml
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: k8-kiam-health
namespace: default
spec:
#ex. values: ns1:name=percona,ns2:run=nginx
appinfo:
appns: kube-system
# FYI, To see app label, apply kubectl get pods --show-labels
#applabel: "app=nginx"
applabel: "app=kiam"
appkind: deployment
jobCleanUpPolicy: retain
monitoring: false
annotationCheck: 'false'
engineState: 'active'
chaosServiceAccount: chaos-admin
experiments:
- name: k8-pod-delete
spec:
components:
env:
- name: NAME_SPACE
value: kube-system
- name: LABEL_NAME
value: kiam
- name: APP_ENDPOINT
value: 'localhost'
- name: FILE
value: 'pod-app-kill-health.json'
- name: REPORT
value: 'true'
- name: REPORT_ENDPOINT
value: 'none'
Create the ChaosEngine Resource
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
Watch Chaos progress
View chaostoolkit pod terminations & recovery by setting up a watch on the chaostoolkit pods in the application namespace
watch kubectl get pods -n kube-system
Check Chaos Experiment Result
Check whether the application is resilient to the chaostoolkit pod failure, once the experiment (job) is completed. The ChaosResult resource name is derived like this:
<ChaosEngine-Name>-<ChaosExperiment-Name>
.kubectl describe chaosresult k8-pod-delete -n <chaos-namespace>
Check Chaos Experiment logs
Check the log and result for existing experiment
kubectl log -f k8-pod-delete-<> -n <chaos-namespace>