Ollama 1. Prerequisites Ubuntu 22.04 server NVIDIA GPU with at least 16 GB VRAM Root or sudo access Stable internet connection 2. Install NVIDIA Drivers Make sure your GPU drivers are installed and working. sudo apt update sudo apt install -y ubuntu-drivers-common ubuntu-drivers devices # check recommended drivers sudo ubuntu-drivers autoinstall Reboot and confirm GPU availability: sudo reboot nvidia-smi You should see details of your GPU. 3. Install CUDA Toolkit Ollama uses CUDA for GPU acceleration. Install CUDA 12.x (recommended). wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.2-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt update sudo apt install -y cuda Check CUDA installation: nvcc --version 3.1  CUDA Toolkit Installer Installation Instructions: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/13.0.2/local_installers/cuda-repo-ubuntu2204-13-0-local_13.0.2-580.95.05-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-13-0-local_13.0.2-580.95.05-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/sudo apt-get update sudo apt-get -y install cuda-toolkit-13-0 3.2 sudo apt install nvidia-cuda-toolkit   4. Install Ollama Download and install Ollama runtime: curl -fsSL https://ollama.com/install.sh | sh Verify service is running: systemctl status ollama Test Ollama Run a quick model test: ollama run tinyllama Other models available: ollama pull gpt-oss:20b 5. Install Docker Open WebUI runs inside Docker. Follow these steps to install Docker and Docker Compose: # Remove any old versions sudo apt remove -y docker docker-engine docker.io containerd runc # Install required packages sudo apt update sudo apt install -y ca-certificates curl gnupg lsb-release # Add Docker’s official GPG key sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg # Add Docker repository echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker and Docker Compose sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin Enable and test Docker: sudo systemctl enable --now docker sudo docker run hello-world Allow your user to run Docker without sudo: sudo usermod -aG docker $USER newgrp docker Check Docker Compose version: docker compose version 6. Install Open WebUI Clone and run Open WebUI: git clone https://github.com/open-webui/open-webui.git cd open-webui docker compose up -d Access the interface at: http://:3000 7. Connect Open WebUI to Ollama In the WebUI: Go to  Settings → Backends → Ollama Set API URL: http://host.docker.internal:11434 Or, if accessing remotely: http://:11434 8. Enable GPU Support in Docker (Optional) If you want Docker containers (like Open WebUI) to access GPU directly: # Install NVIDIA Container Toolkit distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt update sudo apt install -y nvidia-container-toolkit sudo systemctl restart docker Edit  docker-compose.yml  for Open WebUI: services: open-webui: deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] Restart: docker compose down docker compose up -d 9. Verify GPU Usage Run a model and check GPU utilization: ollama run gpt-oss:20b In another terminal: nvidia-smi You should see  ollama  using GPU memory. 10. Start Chatting with any Model You Like Now you can: Pull models with Ollama: ollama pull gpt-oss ollama pull deepseek-r1 ollama pull llama3 ollama pull llama4 ollama pull gemma3 ollama pull phi4 ollama pull codellama Select the model in  Open WebUI . Chat through your browser!