Ollama
1. Prerequisites
- Ubuntu 22.04 server
- NVIDIA GPU with at least 16 GB VRAM
- Root or sudo access
- Stable internet connection
2. Install NVIDIA Drivers
Make sure your GPU drivers are installed and working.
sudo apt update
sudo apt install -y ubuntu-drivers-common
ubuntu-drivers devices # check recommended drivers
sudo ubuntu-drivers autoinstall
Reboot and confirm GPU availability:
sudo reboot
nvidia-smi
You should see details of your GPU.
3. Install CUDA Toolkit
Ollama uses CUDA for GPU acceleration. Install CUDA 12.x (recommended).
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install -y cuda
Check CUDA installation:
nvcc --version
3.1
3.2
sudo apt install nvidia-cuda-toolkit
4. Install Ollama
Download and install Ollama runtime:
curl -fsSL https://ollama.com/install.sh | sh
Verify service is running:
systemctl status ollama
Test Ollama
Run a quick model test:
ollama run tinyllama
Other models available:
ollama pull gpt-oss:20b
5. Install Docker
Open WebUI runs inside Docker. Follow these steps to install Docker and Docker Compose:
# Remove any old versions
sudo apt remove -y docker docker-engine docker.io containerd runc
# Install required packages
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
# Add Docker’s official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# Add Docker repository
echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker and Docker Compose
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Enable and test Docker:
sudo systemctl enable --now docker
sudo docker run hello-world
Allow your user to run Docker without sudo:
sudo usermod -aG docker $USER
newgrp docker
Check Docker Compose version:
docker compose version
6. Install Open WebUI
Clone and run Open WebUI:
git clone https://github.com/open-webui/open-webui.git
cd open-webui
docker compose up -d
Access the interface at:
http://<server-ip>:3000
7. Connect Open WebUI to Ollama
In the WebUI:
-
Go to Settings → Backends → Ollama
-
Set API URL:
http://host.docker.internal:11434Or, if accessing remotely:
http://<server-ip>:11434
8. Enable GPU Support in Docker (Optional)
If you want Docker containers (like Open WebUI) to access GPU directly:
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker
Edit docker-compose.yml for Open WebUI:
services:
open-webui:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Restart:
docker compose down
docker compose up -d
9. Verify GPU Usage
Run a model and check GPU utilization:
ollama run gpt-oss:20b
In another terminal:
nvidia-smi
You should see ollama using GPU memory.
10. Start Chatting with any Model You Like
Now you can:
-
Pull models with Ollama:
ollama pull gpt-oss ollama pull deepseek-r1 ollama pull llama3 ollama pull llama4 ollama pull gemma3 ollama pull phi4 ollama pull codellama -
Select the model in Open WebUI.
-
Chat through your browser!
No Comments