TL;DR
If you’re interested in testing out locally hosted AI/LLM features, this article goes through a number of tools that can let you do that. This article also goes through setup and configuration of a command line tool called K8sGPT that can leverage this service to diagnose and troubleshoot issues with Kubernetes clusters.
All of the resources, code, and instructions used for this project are available in the main project Github repository.
Introduction
Back in November 2023, I was made aware of a contest Hackster was running with AMD, relating to AI development. I had recently done a Hackathon at my job that was focused around AI, and had played around with a number of tools, including K8sGPT – which is a command line tool that uses AI to diagnose issues with Kubernetes clusters, and returns human readable recommendations. During the Hackathon, we experimented with setting up our own LLM using Amazon Sagemaker, but ultimately used our access to the OpenAI API to leverage GPT3.5-Turbo to power K8sGPT.
For this project, I decided to focus on testing access to locally hosted large language models, as I wanted to be able to run it against a Kubernetes cluster running in my homelab, without the need for cloud hosted services or third party APIs.
I submitted my proposal for a project at the end of November 2023, and was approved to participate.
On July 31, 2024 I submitted my final project on the Hackster website, and it is now waiting for judging.
Project Components
This project uses the following hardware and software components:
- AMD Instinct™ MI210 Accelerators – As part of this contest, I was provided access to the AMD Accelerator Cloud, which had the ability to launch Docker containers to leverage AMD Epyc CPUs, and AMD Accelerator GPUs. This was mainly used for testing larger AI models, as the cloud had more resources than my local machine. This could be useful when performing training on existing models.
- AMD Radeon RX 7900XT – For locally hosting services in the scope of this project, I used my own desktop PC GPU, which is a PowerColor Hellhound Radeon RX 7900 XT 20GB.
- LocalAI was used as the backend to act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
- Open WebUI was used to provide a user friendly web interface for testing the local LLMs being loaded.
- K8sGPT was used as a command line tool for scanning and diagnosing issues with Kubernetes clusters, powered by the LocalAI service.
- TrueNAS SCALE was used as a test environment for launching test Kubernetes deployments to use with K8sGPT. This can be replaced with any local or cloud based Kubernetes cluster.
Resources
AMD Accelerator Cloud
Using this cloud resource was an interesting exercise, and I was able to create a script to install the LocalAI service to run inside of a ROCm Ubuntu 22.04 Docker container. If you have access to the service, I’ve outlined all of the details on how to do this in this readme.
AMD Accelerator Cloud is currently testing the ability to launch custom Docker images, and I was able to get access to evaluate it. I will add more details once this feature is ready in this readme.
Locally Hosted AI/LLM Setup
To set up LocalAI and Open WebUI to run on your local machine, I’ve outlined the steps, configuration instructions, and usage in this readme. I’ve provided a Docker Compose YAML file which can be adjusted to your hardware specifications, and used to launch containers for both services.
Setting Up K8sGPT
K8sGPT is a command-line utility that can be installed on your local machine regardless of the operating system, or as a Docker container. It will integrate with your Kubernetes configuration, as well as LocalAI, in order to scan your Kubernetes clusters for issues, and diagnose and triage issues in simple English. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.
I’ve included steps to install K8sGPT and integrate it with Kubernetes and OpenAI in this readme.
Kubernetes Troubleshooting
To test using K8sGPT, I used some example scenarios from Abhishek Veeramalla’s Kubernetes From Zero to Hero YouTube series, specifically on Kubernetes Troubleshooting. You can watch the first video at this link. I included the Github repository as a submodule of my project, so it can be used to test with.
Details on my troubleshooting examples and steps can be found in this readme.
As an example, I purposefully deployed a Kubernetes Deployment that had an incorrect image name. This sent the pods into ImagePullBackOff. K8sGPT was able to detect the error, understand the problem, and suggest a corrected image name.
Accomplishments
This project is a great proof of concept for getting local AI powered tools working to make troubleshooting Kubernetes easier. K8sGPT also supports an in-cluster operator to make scaling troubleshooting easier and eliminating single points of failure.
As part of this project, I was able to help the LocalAI development team test their latest revision which added support for AMD ROCm 6.1. The team does not have access to AMD hardware that supports the HipBLAS library, which is needed for building AMD specific Docker containers for LocalAI. I was able to work with the developers to test and help approve this pull request.
I also worked with the LocalAI developers to recommend an improvement to the ROCm packages for Debian/Ubuntu, as they currently do not follow the standard policy when installing shared libraries. I submitted this issue to the ROCm Github page, which outlines the issue and the solution. It is currently under investigation.
Future Improvements
As part of this project, I wanted to build a fully code defined Kubernetes cluster built on top of Proxmox, to run in my homelab. This proved to be very challenging, and there are a number of software and network limitations I have to resolve before I will be able to do this. Instead, I used my existing TrueNAS SCALE system to test the implementation, but it was sufficient to complete this project. For anyone else using this setup, the Kubernetes cluster(s) should be interchangeable.
I would also like to explore using Retrieval Augmented Generation (RAG), and introduce more training data for my local LLM to improve inference capabilities.
Conclusion
I’m very thankful to have been able to participate in this contest and finish this project. I want to thank all the AMD employees such as Prithvi Mattur and Javier Lázaro Jareño who made this possible, provided resources and support in the Discord. It was a great experience collaborating with other contestants, and joining the office hours to ask questions.
I also want to thank Jinger Zeng from Hackster for all of her work on this contest. It definitely could not have been easy coordinating resources for all of us.
I hope you will find this project useful, and please let me know in the comments if you have any recommendations or improvements I should explore.
References
- Hackster Contest Project Submission
- Hackster Project Proposal
- K8sGPT Project Website
- LocalAI Project Website
- Open WebUI Project Website – load slow, the docs page is much faster