Ollama

1. Prerequisites 

 

 Ubuntu 22.04 server 

 NVIDIA GPU with at least 16 GB VRAM 

 Root or sudo access 

 Stable internet connection 

 

 

 2. Install NVIDIA Drivers 

 Make sure your GPU drivers are installed and working. 

 

 sudo apt update

sudo apt install -y ubuntu-drivers-common

ubuntu-drivers devices # check recommended drivers

sudo ubuntu-drivers autoinstall 

 

 Reboot and confirm GPU availability: 

 

 sudo reboot

nvidia-smi 

 

 You should see details of your GPU. 

 

 3. Install CUDA Toolkit 

 Ollama uses CUDA for GPU acceleration. Install CUDA 12.x (recommended). 

 

 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb

sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/

sudo apt update

sudo apt install -y cuda 

 

 Check CUDA installation: 

 

 nvcc --version 

 

 

 3.1  

 

 

 

 CUDA Toolkit Installer 

 

 

 

 Installation Instructions: 

 

 

 

 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/13.0.2/local_installers/cuda-repo-ubuntu2204-13-0-local_13.0.2-580.95.05-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2204-13-0-local_13.0.2-580.95.05-1_amd64.deb

sudo cp /var/cuda-repo-ubuntu2204-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/sudo apt-get update

sudo apt-get -y install cuda-toolkit-13-0 

 

 

 

 

 3.2 

 sudo apt install nvidia-cuda-toolkit 

   

 4. Install Ollama 

 Download and install Ollama runtime: 

 

 curl -fsSL https://ollama.com/install.sh | sh 

 

 Verify service is running: 

 

 systemctl status ollama 

 

 Test Ollama 

 Run a quick model test: 

 

 ollama run tinyllama 

 

 Other models available: 

 

 ollama pull gpt-oss:20b 

 

 

 5. Install Docker 

 Open WebUI runs inside Docker. Follow these steps to install Docker and Docker Compose: 

 

 # Remove any old versions

sudo apt remove -y docker docker-engine docker.io containerd runc

# Install required packages

sudo apt update

sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker’s official GPG key

sudo mkdir -p /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Add Docker repository

echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker and Docker Compose

sudo apt update

sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin 

 

 Enable and test Docker: 

 

 sudo systemctl enable --now docker

sudo docker run hello-world 

 

 Allow your user to run Docker without sudo: 

 

 sudo usermod -aG docker $USER

newgrp docker 

 

 Check Docker Compose version: 

 

 docker compose version 

 

 

 6. Install Open WebUI 

 Clone and run Open WebUI: 

 

 git clone https://github.com/open-webui/open-webui.git

cd open-webui

docker compose up -d 

 

 Access the interface at: 

 

 http://<server-ip>:3000 

 

 

 7. Connect Open WebUI to Ollama 

 In the WebUI: 

 

 

 Go to  Settings → Backends → Ollama 

 

 

 Set API URL: 

 

 http://host.docker.internal:11434 

 

 Or, if accessing remotely: 

 

 http://<server-ip>:11434 

 

 

 

 

 8. Enable GPU Support in Docker (Optional) 

 If you want Docker containers (like Open WebUI) to access GPU directly: 

 

 # Install NVIDIA Container Toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt update

sudo apt install -y nvidia-container-toolkit

sudo systemctl restart docker 

 

 Edit  docker-compose.yml  for Open WebUI: 

 

 services:

 open-webui:

 deploy:

 resources:

 reservations:

 devices:

 - driver: nvidia

 count: all

 capabilities: [gpu] 

 

 Restart: 

 

 docker compose down

docker compose up -d 

 

 

 9. Verify GPU Usage 

 Run a model and check GPU utilization: 

 

 ollama run gpt-oss:20b 

 

 In another terminal: 

 

 nvidia-smi 

 

 You should see  ollama  using GPU memory. 

 

 10. Start Chatting with any Model You Like 

 Now you can: 

 

 

 Pull models with Ollama: 

 

 ollama pull gpt-oss

ollama pull deepseek-r1

ollama pull llama3

ollama pull llama4

ollama pull gemma3

ollama pull phi4

ollama pull codellama 

 

 

 

 Select the model in  Open WebUI . 

 

 

 Chat through your browser!