Self-Hosted AI with Ollama and Open-WebUI

Preamble

Click below if you’re interested in my journey so far.

Click Here

So the CachyOS experiment from my previous post lasted a grand total of…. 2 days! As great as it is to be on a rolling release with the freshest kernel with the latest packages – it does come with certain caveats. The big deal breaker for me was Docker. Specifically Docker Compose and the NVIDIA Container Toolkit. No matter what I did or what method I used to install each package, I just could not get a container to successfully utilise GPU acceleration. Unfortunately that’s kind of a necessity when working with AI. So off I went, back to a platform I know has worked great for me in the past. ~~Debian!~~ Fedora!

Why not Debian/Ubuntu?

Debian

I use Debian for the majority of my servers and love it, but as a desktop OS it has its limitations, especially for NVIDIA users. Most of the software in the Debian repository is quite dated. Given that I use Plasma 6 as my main DE (at least until Hyprland is more polished), that kind of takes Debian out of the race as it’s still using KDE 5. For Radeon users, this may not be an issue, however NVIDIA REALLY doesn’t play nicely with Wayland/xWayland under KDE. Especially for Electron based applications and some Flatpaks.

Ubuntu

I really don’t like the direction Canonical has taken with Ubuntu in recent years, both with their UI/UX design choices and the inclusion of snap packages as the default choice for application isolation. This review of Ubuntu 22.04 from The Average Linux User pretty accurately reflects my Ubuntu Desktop experience and not much has changed from 22.04 to 24.04. I’ll just say this. There’s a reason Linux Mint Debian Edition exists…

I will note that Ubuntu server on the other hand is pretty stellar.

Other alternatives

While a myriad of distros exist, I seem to consistently find that forks based off the big players end up with minor issues that cascade over time. Documentation can be lacking and community support varies. For this reason I tend to stick with the major players and tweak as needed. Usually Debian for stability, Arch for bleeding edge software, or Fedora for a balance of both. OpenSUSE is another viable alternative.

Requirements

CPU

A CPU with either:
- A high core count
- Support for AVX512 instruction sets and DDR5*

*Older CPUs and DDR3/4 will work fine for smaller models. AVX512 support is more beneficial to performance than core count IF memory bandwidth allows sufficient data throughput. If budget allows, aim for both.

RAM

Preferably DDR5. DDR4 and DDR3 will work, but the higher MT/s of DDR5 is beneficial.
Recommended capacity depends on various factors. Click here to learn more, or use the table below as a rough guide:

Parameters (b)	RAM (GB)
<7b	8GB*
>7b – 14b	~16GB
>14b – 32b	~32GB
>32b – 70b	64GB+

*7b models at Q4 with 8GB of RAM may be acceptable depending on system resources, but is generally not recommended.

Storage

~ 50GB for Docker, Ollama, Open-WebUI and model files

GPU (optional)

Recommended for enhanced performance. Requirements vary by model and configuration. You can use the chart below as a rough guide for running distilled models:

Parameters (b)	VRAM (GB)
<1.5b	~1GB
>1.5 – 7b	~4GB
>7b – 14b	~8GB
>14 – 32b	~16GB
>32b – 70b	34GB+

Installation

Docker

Install docker engine for your distribution according to Docker Docs instructions

NVIDIA Container Toolkit

NVIDIA Container Toolkit is required for CUDA support.

Fedora

Add NVIDIA’s Repository

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

(optional) Enable Experimental Packages

sudo yum-config-manager --enable nvidia-container-toolkit-experimental

Install the NVIDIA Container Toolkit

sudo yum install -y nvidia-container-toolkit

Debian

Add NVIDIA’s production repository

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

(optional) Enable experimental packages

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the NVIDIA Container Toolkit

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Arch

Install the NVIDIA Container Toolkit

sudo pacman -S nvidia-container-toolkit

Configure Docker

sudo nvidia-ctk runtime configure --runtime=docker

Restart the Docker Daemon

sudo systemctl restart docker

Ollama + Open-WebUI

Open-WebUI keep a handy Docker Quick Start guide on their GitHub page. There’s a one liner we can use to start an Open-WebUI docker container with Ollama integration.

To add GPU accelaration:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

For CPU only:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

When Docker finishes pulling the images from ghcr, your Open-WebUI instance will be accessible on http://localhost:3000.

Configuration

**Greeting Page**
Click the arrow over “Get Started” when ready.

**Account Creation Screen**
Enter your name, email and a secure password.

A greeting that outlines the latest features of OpenWebUI. — **Changelog**
List of newly added features and fixes. Click “Okay, Let’s Go!” when you’re ready.

The chat window. This is where you will interact with the chatbot. — **Main Interface**
LLM interactions happen here. First click your initials in the top right.

Image of the drop-down menu to get to admin settings. — **Drop-Down Menu**
Select “Admin Panel” from the drop-down menu

Admin settings interface - note that it defaults to the users screen — **User Settings**
Click the settings tab at the top

Admin settings menu showing the location of the connections options. — **Administrator Settings**
Click “Connections” in the menu on the left.

Locations of where to find the models and how to manage them. — **Ollama Settings**
1. Click here to view all the models available to Ollama
2. Select a model you are interested in and compare your hardware against the requirements for the distilled version you are interested in. Refer to the table above if you need a rough guide.
2. Enter model name and parameter separated by a colon (e.g. deepseek-r1:14b).

image showing how to download the distilled deepseek 14b model. — 3. Click download and wait until complete. Once the checksum is verified, return to the main page.

Overview of the features in Open-WebUI — **Main Page**
1. Starts a new chat
2. Switches between downloaded models
3. Chat history log
4. Prompt entry field
5. Microphone – voice to text input
6. Call – back and forth interaction with a microphone.
7. Dynamically refine python code by executing within a sandboxed pyodide environment.
8. Prompt suggestions that dynamically adjust based on your prompt.

Options in the uploads section of the main chat interface. They are upload and capture. — **Upload Options**
1. Captures a screenshot from a desktop or application window.
2. Lets you select a file to upload for your model to interact with.

Results

Test Prompt

After selecting a model (Deepseek-R1:14b) I ran a quick test prompt to ensure everything works as intended. Deepseek arrived at the correct answer from a mathematical perspective, but incorrect when analyzed logically. 30 is the number of ANIMAL legs, however I asked how many legs do I have? As a human I have two legs. AI’s challenges with trick questions and simple math mistakes can be attributed to several key factors:

Lack of Real-World Experience: AI systems are trained on datasets without real-world context, making them unable to infer hidden meanings or nuances in language.
Literal Processing: AI relies on literal interpretations of text, missing wordplay or jokes that humans understand through contextual understanding.
Training Data Limitations: Simple errors may stem from training data containing inaccuracies or a lack of diverse examples, leading the AI to learn incorrect patterns.
Focus and Optimisation: AI models might be optimised for complex tasks, neglecting the precision needed for basic arithmetic, resulting in simple calculation mistakes.
Statistical Approach: AI uses statistical methods that may not perfectly align with the precise logic required for riddles, leading to errors when problems are framed unexpectedly.

In summary, while AI excels at pattern recognition and processing vast data, it struggles with tasks requiring contextual understanding or intricate logical precision beyond its training scope.

Updating Open-WebUI

Example of an update prompt. — As new features and bugfixes are added, you’ll see an update available notification

You can use Watchtower to update your Docker container automatically:

docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui

If watchtower fails, you can install the update manually in three steps:

Stop and remove the current container

docker rm -f open-webui

Pull the latest docker image

docker pull ghcr.io/open-webui/open-webui:main

Start the container again with the updated image and existing volume attached (add –gpus=all after the -p 3000:8080 parameter for GPU acceleration)

docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Next Steps

What you do with your AI toolkit from here is up to you! There are many many features and integrations you could explore!

For me I aim to do the following

Integrate my self-hosted search engine with Open-WebUI’s SearxNG integration
Integrate Ollama with VSCode using Continue.dev’s VS Code Extension
Compare UnslothAI’s full Deepseek 671B parameter model in it’s 1.58-bit quantized form with Llama.cpp and compare performance against its distilled cousins
Test various text-to-speek engines to interact more naturally
Install and test various image generation tools

I have found a utility called Harbor that will automate the setup for a lot of these tools. I’ll outline my experience with it in a future post.

Preamble

Why not Debian/Ubuntu?

Debian

Ubuntu

Other alternatives

Requirements

CPU

RAM

Storage

GPU (optional)

Installation

Docker

NVIDIA Container Toolkit

Configure Docker

Restart the Docker Daemon

Ollama + Open-WebUI

Configuration

Results

Updating Open-WebUI

Next Steps

Leave a ReplyCancel Reply

Related Posts

Integrating LLMs with Services using Harbor

Exploring Docker on CachyOS: A Personal Journey

Hassle Free PXE With Netboot.xyz!