GCP VM Instance Stop Experiment Details
Experiment Metadata
Type | Description | Tested K8s Platform |
---|---|---|
GCP | Stops GCP VM instances and GKE nodes for a specified duration of time and later restarts them | GKE, Minikube |
WARNING
If the target GCP VM instance is a part of a self-managed nodegroup:
Make sure to drain the target node if any application is running on it and also ensure to cordon the target node before running the experiment so that the experiment pods do not schedule on it.
Prerequisites
- Ensure that Kubernetes Version > 1.16
- Ensure that the Litmus Chaos Operator is running by executing
kubectl get pods
in operator namespace (typically,litmus
). If not, install from here - Ensure that the
gcp-vm-instance-stop
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the desired namespace If not, install from here - Ensure that you have sufficient GCP permissions to stop and start the GCP VM instances.
- Ensure to create a Kubernetes secret having the GCP service account credentials in the default namespace. A sample secret file looks like:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
type:
project_id:
private_key_id:
private_key:
client_email:
client_id:
auth_uri:
token_uri:
auth_provider_x509_cert_url:
client_x509_cert_url:
Entry-Criteria
- VM instance is healthy before chaos injection.
Exit-Criteria
- VM instance is healthy post chaos injection.
Details
- Causes power-off of a GCP VM instance by instance name or list of instance names before bringing it back to the running state after the specified chaos duration.
- It helps to check the performance of the application/process running on the VM instance.
- When the
AUTO_SCALING_GROUP
is enable then the experiment will not try to start the instance post chaos, instead it will check the addition of the new node instances to the cluster.
Steps to Execute the Chaos Experiment
This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started
Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.
Prepare chaosServiceAccount
- Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app) namespace. This example consists of the minimum necessary role permissions to execute the experiment.
Sample Rbac Manifest
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gcp-vm-instance-stop-sa
namespace: default
labels:
name: gcp-vm-instance-stop-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gcp-vm-instance-stop-sa
labels:
name: gcp-vm-instance-stop-sa
app.kubernetes.io/part-of: litmus
rules:
- apiGroups: [""]
resources: ["pods","events","secrets"]
verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: [""]
resources: ["pods/exec","pods/log"]
verbs: ["create","list","get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gcp-vm-instance-stop-sa
labels:
name: gcp-vm-instance-stop-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gcp-vm-instance-stop-sa
subjects:
- kind: ServiceAccount
name: gcp-vm-instance-stop-sa
namespace: default
Prepare ChaosEngine
- Provide the application info in
spec.appinfo
. It is an optional parameter for infra level experiment. - Provide the auxiliary applications info (ns & labels) in
spec.auxiliaryAppInfo
- Override the experiment tunables if desired in
experiments.spec.components.env
- To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts
Supported Experiment Tunables
GCP_PROJECT_ID | GCP project ID to which the VM instances belong | Mandatory | All the VM instances must belong to a single GCP project |
VM_INSTANCE_NAMES | Name of target VM instances | Mandatory | Multiple instance names can be provided as instance1,instance2,... |
INSTANCE_ZONES | The zones of the target VM instances | Mandatory | Zone for every instance name has to be provided as zone1,zone2,... in the same order of VM_INSTANCE_NAMES |
Variables | Description | Specify In ChaosEngine | Notes |
---|---|---|---|
TOTAL_CHAOS_DURATION | The total time duration for chaos insertion (sec) | Optional | Defaults to 30s |
CHAOS_INTERVAL | The interval (in sec) between successive instance termination | Optional | Defaults to 30s |
AUTO_SCALING_GROUP | Set to enable if the target instance is the part of a auto-scaling group |
Optional | Defaults to disable |
SEQUENCE | It defines sequence of chaos execution for multiple instance | Optional | Default value: parallel. Supported: serial, parallel |
RAMP_TIME | Period to wait before injection of chaos in sec | Optional | Defaults to 0 sec |
INSTANCE_ID | A user-defined string that holds metadata/info about current run/instance of chaos. Ex: 04-05-2020-9-00. This string is appended as suffix in the chaosresult CR name | Optional | Ensure that the overall length of the chaosresult CR is still < 64 characters |
Sample ChaosEngine Manifest
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: gcp-vm-chaos
spec:
engineState: 'active'
chaosServiceAccount: gcp-vm-instance-stop-sa
experiments:
- name: gcp-vm-instance-stop
spec:
components:
env:
# set chaos duration (in sec) as desired
- name: TOTAL_CHAOS_DURATION
value: '30'
# set chaos interval (in sec) as desired
- name: CHAOS_INTERVAL
value: '30'
# Instance name of the target vm instance(s)
# Multiple instance names can be provided as comma separated values ex: instance1,instance2
- name: VM_INSTANCE_NAMES
value: ''
# GCP project ID to which the vm instances belong
- name: GCP_PROJECT_ID
value: ''
# Instance zone(s) of the target vm instance(s)
# If more than one instance is targetted, provide zone for each in the order of their
# respective instance name in VM_INSTANCE_NAME as comma separated values ex: zone1,zone2
- name: INSTANCE_ZONES
value: ''
# enable it if the target instance is a part of self-managed auto scaling group.
- name: AUTO_SCALING_GROUP
value: 'disable'
Create the ChaosEngine Resource
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
If the chaos experiment is not executed, refer to the troubleshooting section to identify the root cause and fix the issues.
Watch Chaos progress
Monitor the VM Instance status using GCP Cloud SDK:
gcloud compute instances describe INSTANCE_NAME --zone=INSTANCE_ZONE
GCP console can also be used to monitor the instance status.
Abort/Restart the ChaosExperiment
To stop the gcp-vm-instance-stop experiment immediately, either delete the ChaosEngine resource or execute the following command:
kubectl patch chaosengine <chaosengine-name> -n <namespace> --type merge --patch '{"spec":{"engineState":"stop"}}'
To restart the experiment, either re-apply the ChaosEngine YAML or execute the following command:
kubectl patch chaosengine <chaosengine-name> -n <namespace> --type merge --patch '{"spec":{"engineState":"active"}}'
Check Chaos Experiment Result
Check whether the application is resilient to the gcp-vm-instance-stop, once the experiment (job) is completed. The ChaosResult resource name is derived like this:
<ChaosEngine-Name>-<ChaosExperiment-Name>
.kubectl describe chaosresult gcp-vm-chaos-gcp-vm-instance-stop
GCP VM Instance Stop Experiment Demo
- A sample recording of this experiment execution will be added soon.