Self-Hosted AI with Ollama and Open-WebUI

Preamble

Click below if you’re interested in my journey so far.

Click Here

So the CachyOS experiment from my previous post lasted a grand total of…. 2 days! As great as it is to be on a rolling release with the freshest kernel with the latest packages – it does come with certain caveats. The big deal breaker for me was Docker. Specifically Docker Compose and the NVIDIA Container Toolkit. No matter what I did or what method I used to install each package, I just could not get a container to successfully utilise GPU acceleration. Unfortunately that’s kind of a necessity when working with AI. So off I went, back to a platform I know has worked great for me in the past. Debian! Fedora!

Why not Debian/Ubuntu?
Debian

I use Debian for the majority of my servers and love it, but as a desktop OS it has its limitations, especially for NVIDIA users. Most of the software in the Debian repository is quite dated. Given that I use Plasma 6 as my main DE (at least until Hyprland is more polished), that kind of takes Debian out of the race as it’s still using KDE 5. For Radeon users, this may not be an issue, however NVIDIA REALLY doesn’t play nicely with Wayland/xWayland under KDE. Especially for Electron based applications and some Flatpaks.

Ubuntu

I really don’t like the direction Canonical has taken with Ubuntu in recent years, both with their UI/UX design choices and the inclusion of snap packages as the default choice for application isolation. This review of Ubuntu 22.04 from The Average Linux User pretty accurately reflects my Ubuntu Desktop experience and not much has changed from 22.04 to 24.04. I’ll just say this. There’s a reason Linux Mint Debian Edition exists…

I will note that Ubuntu server on the other hand is pretty stellar.

Other alternatives

While a myriad of distros exist, I seem to consistently find that forks based off the big players end up with minor issues that cascade over time. Documentation can be lacking and community support varies. For this reason I tend to stick with the major players and tweak as needed. Usually Debian for stability, Arch for bleeding edge software, or Fedora for a balance of both. OpenSUSE is another viable alternative.

Requirements

CPU

*Older CPUs and DDR3/4 will work fine for smaller models. AVX512 support is more beneficial to performance than core count IF memory bandwidth allows sufficient data throughput. If budget allows, aim for both.

RAM
  • Preferably DDR5. DDR4 and DDR3 will work, but the higher MT/s of DDR5 is beneficial.
  • Recommended capacity depends on various factors. Click here to learn more, or use the table below as a rough guide:
Parameters (b)RAM (GB)
<7b8GB*
>7b – 14b~16GB
>14b – 32b~32GB
>32b – 70b64GB+
*7b models at Q4 with 8GB of RAM may be acceptable depending on system resources, but is generally not recommended.
Storage
  • ~ 50GB for Docker, Ollama, Open-WebUI and model files
GPU (optional)
  • Recommended for enhanced performance. Requirements vary by model and configuration. You can use the chart below as a rough guide for running distilled models:
Parameters (b)VRAM (GB)
<1.5b~1GB
>1.5 – 7b~4GB
>7b – 14b~8GB
>14 – 32b~16GB
>32b – 70b34GB+

Installation

Docker

Install docker engine for your distribution according to Docker Docs instructions

NVIDIA Container Toolkit

NVIDIA Container Toolkit is required for CUDA support.

Fedora
  1. Add NVIDIA’s Repository
  1. (optional) Enable Experimental Packages
  1. Install the NVIDIA Container Toolkit
Debian

Add NVIDIA’s production repository

(optional) Enable experimental packages

Install the NVIDIA Container Toolkit

Arch

Install the NVIDIA Container Toolkit

Configure Docker
Restart the Docker Daemon

Ollama + Open-WebUI

Open-WebUI keep a handy Docker Quick Start guide on their GitHub page. There’s a one liner we can use to start an Open-WebUI docker container with Ollama integration.

To add GPU accelaration:

For CPU only:

When Docker finishes pulling the images from ghcr, your Open-WebUI instance will be accessible on http://localhost:3000.

Configuration

Greeting Page
Click the arrow over “Get Started” when ready.
Account Creation Screen
Enter your name, email and a secure password.
A greeting that outlines the latest features of OpenWebUI.
Changelog
List of newly added features and fixes. Click “Okay, Let’s Go!” when you’re ready.
The chat window. This is where you will interact with the chatbot.
Main Interface
LLM interactions happen here. First click your initials in the top right.
Image of the drop-down menu to get to admin settings.
Drop-Down Menu
Select “Admin Panel” from the drop-down menu
Admin settings interface - note that it defaults to the users screen
User Settings
Click the settings tab at the top
Admin settings menu showing the location of the connections options.
Administrator Settings
Click “Connections” in the menu on the left.
Locations of where to find the models and how to manage them.
Ollama Settings
1. Click here to view all the models available to Ollama
2. Select a model you are interested in and compare your hardware against the requirements for the distilled version you are interested in. Refer to the table above if you need a rough guide.
2. Enter model name and parameter separated by a colon (e.g. deepseek-r1:14b).
image showing how to download the distilled deepseek 14b model.
3. Click download and wait until complete. Once the checksum is verified, return to the main page.
Overview of the features in Open-WebUI
Main Page
1. Starts a new chat
2. Switches between downloaded models
3. Chat history log
4. Prompt entry field
5. Microphone – voice to text input
6. Call – back and forth interaction with a microphone.
7. Dynamically refine python code by executing within a sandboxed pyodide environment.
8. Prompt suggestions that dynamically adjust based on your prompt.
Options in the uploads section of the main chat interface. They are upload and capture.
Upload Options
1. Captures a screenshot from a desktop or application window.
2. Lets you select a file to upload for your model to interact with.

Results

Test Prompt

After selecting a model (Deepseek-R1:14b) I ran a quick test prompt to ensure everything works as intended. Deepseek arrived at the correct answer from a mathematical perspective, but incorrect when analyzed logically. 30 is the number of ANIMAL legs, however I asked how many legs do I have? As a human I have two legs. AI’s challenges with trick questions and simple math mistakes can be attributed to several key factors:

  1. Lack of Real-World Experience: AI systems are trained on datasets without real-world context, making them unable to infer hidden meanings or nuances in language.
  2. Literal Processing: AI relies on literal interpretations of text, missing wordplay or jokes that humans understand through contextual understanding.
  3. Training Data Limitations: Simple errors may stem from training data containing inaccuracies or a lack of diverse examples, leading the AI to learn incorrect patterns.
  4. Focus and Optimisation: AI models might be optimised for complex tasks, neglecting the precision needed for basic arithmetic, resulting in simple calculation mistakes.
  5. Statistical Approach: AI uses statistical methods that may not perfectly align with the precise logic required for riddles, leading to errors when problems are framed unexpectedly.

In summary, while AI excels at pattern recognition and processing vast data, it struggles with tasks requiring contextual understanding or intricate logical precision beyond its training scope.

Updating Open-WebUI

Example of an update prompt.
As new features and bugfixes are added, you’ll see an update available notification

You can use Watchtower to update your Docker container automatically:

If watchtower fails, you can install the update manually in three steps:

  1. Stop and remove the current container
  1. Pull the latest docker image
  1. Start the container again with the updated image and existing volume attached (add –gpus=all after the -p 3000:8080 parameter for GPU acceleration)

Next Steps

What you do with your AI toolkit from here is up to you! There are many many features and integrations you could explore!

For me I aim to do the following

  • Integrate my self-hosted search engine with Open-WebUI’s SearxNG integration
  • Integrate Ollama with VSCode using Continue.dev’s VS Code Extension
  • Compare UnslothAI’s full Deepseek 671B parameter model in it’s 1.58-bit quantized form with Llama.cpp and compare performance against its distilled cousins
  • Test various text-to-speek engines to interact more naturally
  • Install and test various image generation tools

I have found a utility called Harbor that will automate the setup for a lot of these tools. I’ll outline my experience with it in a future post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.