Chaos Engineering in a Microservices Environment

The Litmus Demo is a quick way to introduce yourself to the world of Cloud Native Chaos Engineering. It helps familiarize oneself with running litmus chaos experiments in a realistic app environment running multiple services on a Kubernetes cluster. In other words, you will be able to create a cluster, install a sample application that will be subject to chaos, build the chaos infrastructure & run the chaos experiments, all in a matter of minutes, using a simple python script.

It is designed to help you experience chaos injection almost out of the box and is expected to serve as a precursor to a more involved exploration of LitmusChaos framework for your own business application/infrastructure chaos needs.

The Demo Environment

Three cluster types are supported:

KinD: A 3 node KinD cluster pre-installed with the sock-shop demo application, litmus chaos CRDs, operator, and a minimal set of prebuilt chaos experiment CRs is setup.
GKE: A 3 node GKE cluster pre-installed with the sock-shop demo application, litmus chaos CRDs, operator, and the full generic Kubernetes chaos experiment suite is setup.
EKS: A 3 node EKS cluster pre-installed with the sock-shop demo application, litmus chaos CRDs, operator, and the full generic Kubernetes chaos experiment suite is setup.

PreRequisites

Docker 18.09 or greater (When using containerized setup)
Docker, Kubectl & Python3.7+ (with the PyYaml package) are all you will need for running the KinD platform based chaos demo. If GKE/EKS is your platform choice, you may need to configure gcloud/aws (When using a non-containerized setup).

Getting started:

To get started with any of the above platforms we will follow the following steps.

Clone litmus demo repository in your system.

git clone https://github.com/litmuschaos/litmus-demo.git
cd litmus-demo

We can Setup litmus demo in two different ways:

Containerized Setup	All the dependencies are installed inside a container we just need to have docker installed to run the container. It has been tested for Kind platform.
Non-Containerized Setup	We need to make sure that all the prerequisites are installed manually before running the demo script.

Containerized Setup

You can setup & run the demo from a containerized environment by following the below-mentioned steps:

make build

docker build -t litmuschaos/litmus-demo .

Run docker container interactive, now you can run any commands mentioned in the usage section with python3.

make exec

docker run -v /var/run/docker.sock:/var/run/docker.sock --net="host" -it --entrypoint bash litmuschaos/litmus-demo

Now you can run the litmus demo script in the following ways:

Execing into the container: The make exec will exec into the litmus demo container where you can use the ./manage.py script directly to run the demo script.

$ make exec
------------------
--> Login to Litmus Demo container

bash-5.0# ./manage.py -h
usage: manage.py [-h] {start,test,list,stop} ...

Spin up a Demo Environment on Kubernetes.

positional arguments:
  {start,test,list,stop}
    start               Start a Cluster with the demo environment deployed.
    test                Run Litmus ChaosEngine Experiments inside litmus demo environment.
    list                List all available Litmus ChaosEngine Experiments available to run.
    stop                Shutdown the Cluster with the demo environment deployed.

optional arguments:
  -h, --help            show this help message and exit
bash-5.0#

Without execing into the container: You can run the demo script from outside the container using the runcmd script.


$ ./runcmd -h
running -h inside container

usage: manage.py [-h] {start,test,list,stop} ...

Spin up a Demo Environment on Kubernetes.

positional arguments:
  {start,test,list,stop}
    start               Start a Cluster with the demo environment deployed.
    test                Run Litmus ChaosEngine Experiments inside litmus demo environment.
    list                List all available Litmus ChaosEngine Experiments available to run.
    stop                Shutdown the Cluster with the demo environment deployed.

optional arguments:
  -h, --help            show this help message and exit

Non Containerized Setup

To get started with the non containerized setup we need to clone the litmus-demo repository and run the demo script as mentioned in the Usage section below.

git clone https://github.com/litmuschaos/litmus-demo.git
cd litmus-demo

Now we can use the manage.py python script to setup the litmus demo environment.

$ ./manage.py -h
usage: manage.py [-h] {start,test,list,stop} ...

Spin up a Demo Environment on Kubernetes.

positional arguments:
  {start,test,list,stop}
    start               Start a Cluster with the demo environment deployed.
    test                Run Litmus ChaosEngine Experiments inside litmus demo
                        environment.
    list                List all available Litmus ChaosEngine Experiments
                        available to run.
    stop                Shutdown the Cluster with the demo environment
                        deployed.

optional arguments:
  -h, --help            show this help message and exit

Usage

If you are using containerized setup. Follow one of the steps mentioned above to run the litmus demo. If you want to run the demo script without execing into the container replace ./manage.py with ./runcmd from the below commands. For a non-containerized setup you can directly run the commands mentioned below.

Install the demo environment using one of the platforms with the start argument

For KinD Cluster
- Install & bring-up the KinD cluster using the following command
```
./manage.py start --platform kind
```
- Wait for all the pods to get in a ready state. You can monitor this using

```bash
watch kubectl get pods --all-namespaces
```

Once all pods come into Running state we can access the sock-shop application through web-ui which will help us to 
visualize the impact of chaos on the application and whether the application persists after chaos injections.

Get the port of frontend deployment

kubectl get deploy front-end -n sock-shop -o jsonpath='{.spec.template.spec.containers[?(@.name == "front-end")].ports[0].containerPort}'

Perform port forwarding on the port obtained above

kubectl port-forward deploy/front-end -n sock-shop 3000:<port-number> (typically, 8079)

Access the web-ui (catalogue) of the sock-shop app via the address 127.0.0.1:3000 on your browser

For GKE Cluster

Create the GKE cluster (ensure you have set up access to your gcloud project)
```
./manage.py start --platform GKE
```
Wait for all the pods to get in a ready state. You can monitor this using

```bash
watch kubectl get pods --all-namespaces
```

Once all pods come into Running state we can access the sock-shop application through web-ui which will help us to 
visualize the impact of chaos on the application and whether the application persists after chaos injections.

After a few min, identify the ingress IP to access the web-ui
```
kubectl get ingress basic-ingress --namespace=sock-shop
```
You can access the web application in a few minutes at http://<ingress-ip>

For EKS Cluster

Create the EKS cluster (ensure you have setup access to your AWS project)
```
./manage.py start --platform EKS --name {EKS_CLUSTER_NAME}
```
Wait for all the pods to get in a ready state. You can monitor this using

```bash
watch kubectl get pods --all-namespaces
```

Once all pods come into Running state we can access the sock-shop application through web-ui which will help us to 
visualize the impact of chaos on the application and whether the application persists after chaos injections.

After a few min, identify the ingress IP to access the web-ui
```
kubectl get ingress basic-ingress --namespace=sock-shop
```
You can access the web application in a few minutes at http://<ingress-ip>

Running Chaos in Demo Environment

To find out the supported tests for a platform, execute the following command:
```
./manage.py list --platform <kind|gke>
```
For running all experiments, run:

./manage.py test --platform <platform-name>

For running selective experiments

./manage.py test  --platform <platform-name> --test <test-name>

Example: For running the pod-delete experiment.

./manage.py test  --platform kind --test pod-delete

Chaos Experiment Results

The experiment results (Pass/Fail) are derived based on the simple criteria of app availability post chaos & are summarized on the console once the execution completes. You are also encouraged to check the changes to the status of the web-ui & the respective microservices/pods as the experiment execute to get an idea of the failure injection process and subsequent recovery.

Get more details about the flags used to configure and run the chaos tests please refer to the parameter tables in the test section.

Generate PDF of the experiment result summary

We can also generate the pdf report of the experiment result summary using --report flag as follow:

./manage.py test --report=yes

It will generate a pdf report of the name chaos-report.pdf in the current location containing the ChaosResult summary.

Deleting Cluster / Cluster Clean Up

To shut down and destroy the cluster when you're finished, run the following commands:

For KinD cluster

./manage.py --platform kind stop

For GKE cluster

./manage.py --platform GKE stop --project {GC_PROJECT}

For EKS cluster

./manage.py --platform EKS stop --name {EKS_CLUSTER_NAME} --awsregion {EKS_REGION_NAME}

Litmus Docs

next

OpenShift

Rancher

Generic

Kube-AWS

AWS-SSM

GCP

VMWare

Azure

OpenEBS

Kafka

CoreDns

Cassandra