Have you ever encountered any debugging or diagnostic issues while working on Kubernetes?
Yes, of course!
As an SRE, it can sometimes take time to identify and solve problems that arise in one of our many clusters. Therefore, we need a quick and effective way to diagnose issues and provide tailored solutions.
Since I often find myself in this situation, I decided to test a promising tool – and you may have already guessed its name – K8sGPT.
In this blog post, we will discuss K8sGPT, a simple tool that claims to grant SRE superpowers to everyone. I’ll provide practical guidance on how to install and use it, as well as my feedback on its effectiveness. But first, let’s take a look at what K8sGPT is.
What is K8sGPT?
K8sGPT is an analysis tool designed for your Kubernetes cluster. It simplifies the process of diagnosing and triaging issues by leveraging AI to provide guidance in plain English.
This tool is primarily intended for SRE and DevOps engineers, helping them understand their cluster’s activities and identify the root cause of any issues.
With its help, you can easily navigate through logs and other tools to identify the main source of a problem.
How K8sGPT Works
K8sGPT uses analyzers to identify and diagnose problems within your cluster. It includes a range of built-in analyzers, and you can create custom ones based on your requirements.
Currently, K8sGPT has seven built-in analyzers enabled by default and two optional ones.
Here’s a breakdown of the analyzers:
Built-in analyzers:
- PodAnalyzer: Analyzes pod-related issues and helps optimize resource usage.
- PvcAnalyzer: Inspects Persistent Volume Claims (PVCs) for potential storage problems.
- RsAnalyzer: Examines ReplicaSets to identify issues related to scaling and replication.
- ServiceAnalyzer: Analyzes Kubernetes services to ensure proper load balancing and service discovery.
- EventAnalyzer: Monitors cluster events for any anomalies or potential problems.
- IngressAnalyzer: Evaluates Ingress configurations to optimize traffic routing and security.
- StatefulSetAnalyzer: Analyzes StatefulSets for issues concerning stateful applications and their data management.
Optional analyzers
- HpaAnalyzer: Assesses Horizontal Pod Autoscalers to optimize the scaling of pods based on resource utilization and performance metrics.
- PdbAnalyzer: Evaluates Pod Disruption Budgets to ensure proper management of voluntary and involuntary disruptions in the cluster.
K8sGPT’s analyzers leverage NLP to examine logs, metrics, and other data within your cluster, enabling them to pinpoint potential issues.
When an issue is detected, K8sGPT provides an easy-to-understand explanation in natural language – even for someone unfamiliar with the technology.
K8sGPT is built upon OpenAI’s GPT-3 language model, which facilitates natural language understanding.
This means that you can ask questions such as “Why is my pod crashing?” and K8sGPT will use API calls to your configured AI provider to analyze your data and provide an explanation in natural language as to why the pod is crashing.
Note regarding privacy: K8sGPT is a read-only tool. This means it does not modify the data in your cluster, and it has options to anonymize data before sending it to the AI backend.
K8sGPT Installation
There are several ways to install K8sGPT. I chose the recommended method, which is using the Brew package manager:
brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt
To verify the installation, run
k8sgpt version
k8sgpt version 0.2.1
Note: I use Ubuntu 20.04.5 LTS with WSL 2. It’s also available on Mac, Windows, and other Linux distributions.
K8sGPT Configuration
Once K8sGPT is installed, you need to configure the AI provider. In my case, I used OpenAI.
K8sGPT requires an API key from OpenAI. If you haven’t already, create one here and save it somewhere.
Note: Make sure to set your billing information. As a reminder, OpenAI uses a pay-as-you-go model. You are billed per API call. If you’d like to know more, check their pricing page here.
The next step is to register your API key with K8sGPT.
Run the following command:
$ k8sgpt auth -b openai
Using openai as backend AI provider
Enter openai Key:
It will ask you to enter your API key in secret mode. Once you’ve done that, you’re all set.
Simple use case when a pod crashes
Now, let’s see how it works on a simple use case when a pod crashes. Before digging into this use case, let’s see which filters are activated by default.
To do that, run the command:
$ k8sgpt filters list
Active:
VulnerabilityReport (integration)
Unused:
Service
Ingress
StatefulSet
Pod
ReplicaSet
PersistentVolumeClaim
HorizontalPodAutoScaler
PodDisruptionBudget
By default, all filters are executed during analysis.
Let’s create a pod with an incorrect image tag:
$ k run mypod --image=nginx:faketag
Then, check the status:
$ k get pod
NAME READY STATUS RESTARTS AGE
mypod 0/1 ImagePullBackOff 0 104m
When I run K8sGPT to analyze my cluster:
$ k8sgpt analyze
Here is the output:
0 ckad/mypod(mypod)
Error: Back-off pulling image "nginx:v1"
It detected an error in my pod.
Now, let’s ask OpenAI to explain in natural language what is going on with our pod. You need to run this command:
$ k8sgpt analyze --explain -o json
{
"status": "ProblemDetected",
"problems": 1,
"results": [
{
"kind": "Pod",
"name": "ckad/mypod",
"error": [
{
"Text": "Back-off pulling image \"nginx:v1\"",
"Sensitive": []
}
],
"details": "The Kubernetes system is experiencing difficulty fetching or downloading the \"nginx:v1\" image, causing it to back off and retry. \n\nTo solve this issue, you can try the following:\n\n1. Check your network connection to ensure it is stable and able to connect to the image repository.\n2. Verify that the Docker image tag \"v1\" exists within the repository and that it is accessible to Kubernetes.\n3. Check for any restrictive network policies or firewalls that may be blocking access to the image repository and update them accordingly. \n4. If the issue persists, try pulling a different version of the \"nginx\" image or try using a different image altogether.",
"parentObject": "mypod"
}
]
}
Note that you can use the ‘-a’ option to anonymize the data before sending it to the AI backend.
As you can see, the output is a JSON document with clear key and value pairs. The most important result is the “details” field, which provides more information on why the pod is failing.
I think the “promise” of K8sGPT is met, as the explanation is easy to understand and it also suggests actions to take to resolve this issue.
Yes, I admit this is a simple use case; however, in real use cases, it can be incredibly useful.
A potential downside of using K8sGPT is the cost. In fact, explaining an issue involves making an API call to the AI backend, and depending on your AI provider, this can be more or less expensive.
Some teams that are protective or have difficulty trusting AI might hesitate using K8sGPT, fearing for their data. On this matter, it’s important to note that K8sGPT is a read-only tool and has options to anonymize the data before it is processed by the AI.
In my upcoming tutorials, I’ll explain how to integrate external APIs like Trivy into K8sGPT and cover more complex use cases.
And you, have you already tried K8sGPT? I’d love to hear your feedback 🙂