ChaosToolKit AWS EC2 Experiment Details
Experiment Metadata
Type | Description | Tested K8s Platform |
---|---|---|
ChaosToolKit | ChaosToolKit AWS EC2 terminate experiment | Kubeadm, Minikube |
Prerequisites
- Ensure that the Litmus ChaosOperator is running by executing
kubectl get pods
in operator namespace (typically,litmus
). If not, install from here - Ensure that the
aws-ec2-terminate
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the desired namespace. If not, install from here - Ensure you have nginx default application setup on default namespace ( if you are using specific namespace please execute below on that namespace)
Entry Criteria
- Application replicas are healthy before chaos injection
- EC2 Instances are running on this namespace
- Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point
Exit Criteria
- Application replicas are healthy after chaos injection
- EC2 Instances are running on this namespace
- Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point
Details
- Causes EC2 Terminate instance failure of application replicas using ChaosToolKit based on provided namespace and Label while doing health checks against the endpoint
- Tests deployment sanity with steady state hypothesis executed pre and post pod failures, which is running on AWS EC2 instance
- Service resolution will fail if application is not able to launch on new EC2 instance.
Use Cases for executing the experiment
Type | Experiment | Details | json |
---|---|---|---|
ChaosToolKit | ChaosToolKit single, random EC2 terminate experiment with Application uptime | Executing via label name app=<> | ec2-delete.json |
Integrations
- AWS EC2 failures can be effected using one of these chaos libraries:
litmus
Steps to Execute the ChaosExperiment
This ChaosExperiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started
Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.
Prepare chaosServiceAccount
- Based on your use case pick one of the choice from here
https://github.com/litmuschaos/chaos-charts/tree/v1.10.x/charts/kube-aws/k8-aws-ec2-terminate
Sample Rbac Manifest for Service Owner use case
apiVersion: v1
kind: ServiceAccount
metadata:
name: k8-aws-ec2-terminate-sa
labels:
name: k8-aws-ec2-terminate-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: k8-aws-ec2-terminate-sa
labels:
name: k8-aws-ec2-terminate-sa
app.kubernetes.io/part-of: litmus
rules:
- apiGroups: ["","apps","batch","extensions","litmuschaos.io","openebs.io","storage.k8s.io"]
resources: ["chaosengines","chaosexperiments","chaosresults","configmaps","cstorpools","cstorvolumereplicas","events","jobs","persistentvolumeclaims","persistentvolumes","pods","pods/exec","pods/log","secrets","storageclasses","chaosengines","chaosexperiments","chaosresults","configmaps","cstorpools","cstorvolumereplicas","daemonsets","deployments","events","jobs","persistentvolumeclaims","persistentvolumes","pods","pods/eviction","pods/exec","pods/log","replicasets","secrets","services","statefulsets","storageclasses"]
verbs: ["create","delete","get","list","patch","update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get","list","patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: k8-aws-ec2-terminate-sa
labels:
name: k8-aws-ec2-terminate-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: k8-aws-ec2-terminate-sa
subjects:
- kind: ServiceAccount
name: k8-aws-ec2-terminate-sa
namespace: default
Prepare ChaosEngine
Provide the application info in
spec.appinfo
- It will be default as
appinfo: appns: default applabel: 'app=nginx' appkind: deployment
- It will be default as
Override the experiment tunables if desired in
experiments.spec.components.env
To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts
Supported Experiment Tunables
Variables | Description | Specify In ChaosEngine | Notes |
---|---|---|---|
NAME_SPACE | This is chaos namespace which will create all infra chaos resources in that namespace | Mandatory | Default to default |
LABEL_NAME | The default name of the label | Mandatory | Defaults to nginx |
APP_ENDPOINT | Endpoint where ChaosToolKit will make a call and ensure the application is healthy | Mandatory | Defaults to localhost |
FILE | Type of pod-delete chaos (in terms of steady state checks performed) we want to execute, represented by the ChaosToolKit json file | Mandatory | Default to `ec2-delete.json` |
AWS_ROLE | AWS role which will be use from this namespace | Mandatory | Default to `chaosec2access` |
AWS_ACCOUNT | Which AWS account the AWS role will be used | Mandatory | Default to `000000000000` |
AWS_REGION | AWS region where we want to terminate the ec2 | Mandatory | Default to `us-west-2` |
AWS_AZ | AWS Availability Zone where we want to terminate the ec2 | Mandatory | Default to `us-west-2c` |
AWS_RESOURCE | AWS Resource we want to terminate, this time used for AWS Kubernetes cluster | Mandatory | Default to `ec2-iks` |
AWS_SSL | AWS end point for making call, has SSL enbaledor not | Mandatory | Default to `false` |
REPORT | The Report of execution coming in json format | Optional | Defaults to is `true` |
REPORT_ENDPOINT | Report endpoint which can take the json format and submit it | Optional | Default to setup for Kafka topic for chaos, but can support any reporting database |
TEST_NAMESPACE | Place holder from where the ChaosExperiment is executed | Optional | Defaults to is `default` |
Sample ChaosEngine Manifest
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: k8-aws-ec2-terminate
namespace: default
spec:
appinfo:
appns: 'default'
applabel: 'app=nginx'
appkind: 'deployment'
# It can be delete/retain
jobCleanUpPolicy: 'retain'
engineState: 'active'
chaosServiceAccount: chaos-admin
components:
runner:
runnerannotation:
iam.amazonaws.com/role: "k8s-chaosec2access"
experiments:
- name: k8-aws-ec2-terminate
spec:
components:
experimentannotation:
iam.amazonaws.com/role: "k8s-chaosec2access"
env:
- name: NAME_SPACE
value: default
- name: LABEL_NAME
value: app=nginx
- name: APP_ENDPOINT
value: localhost
- name: FILE
value: 'ec2-delete.json'
- name: AWS_ROLE
value: 'chaosec2access'
- name: AWS_ACCOUNT
value: '0000000000'
- name: AWS_REGION
value: 'us-west-2'
- name: AWS_AZ
value: 'us-west-2c'
- name: AWS_RESOURCE
value: 'ec2-iks'
- name: AWS_SSL
value: 'false'
- name: REPORT
value: 'true'
- name: REPORT_ENDPOINT
value: 'none'
- name: TEST_NAMESPACE
value: 'default'
Create the ChaosEngine Resource
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
Watch Chaos progress
View aws ec2 instance termination & recovery by setting up a watch on the nodes/verify in AWS console
watch kubectl get pods
Check ChaosExperiment Result
Check whether the application is resilient to the ChaosToolKit aws ec2 termination, once the experiment (job) is completed. The ChaosResult resource name is derived like this:
<ChaosEngine-Name>-<ChaosExperiment-Name>
.kubectl describe chaosresult k8-aws-ec2-terminate-k8-aws-ec2-terminate -n <chaos-namespace>
Check ChaosExperiment logs
Check the log and result for existing experiment
kubectl log -f k8-aws-ec2-terminate-<hax-value> -n <chaos-namespace>