How-to install Ollama in Ubuntu with vGPU on the ITS Private Cloud

Run large language models (LLMs) with ollama on Ubuntu Linux, utilizing vGPU resources powered by U of T ITS Private Cloud Infrastructure. Utilize the VSS-CLI commands to facilitate this process.

We've pinpointed a bug in the NVIDIA driver version 525_525.147.05 when used alongside Ubuntu kernel 5.15.0-105-generic or newer. To ensure smooth operation of this tutorial, we recommend sticking to kernel version 5.15.0-102-generic. For further details, refer to the following link: Bug Report

ollama is an LLM serving platform written in golang. It makes LLMs built on Llama standards easy to run with an API.

Steps

Virtual Machine Deployment

Download and update the following attributes:
1. machine.folder: target logical folder. List available folders with vss-cli compute folder ls
2. metadata.client : your department client.
3. metadata.inform: email address for automated notifications
Deploy your file as follows:
vss-cli --wait compute vm mk from-file ubuntu-llm-ollama.yaml
Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How-to Request a Virtual GPU
vss-cli compute vm set <VM_ID> gpu mk --profile 16q
Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.
Power on virtual machine
vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

Current supported driver is nvidia-linux-grid-535_535.183.01_amd64.deb

Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init
Download the NVIDIA drivers from VKSEY-STOR:
Install the drivers as privileged user:
Create directory: ClientConfigToken
Create the NVIDIA token file:
Set permissions to the NVIDIA token:
Set the FeatureType to 2 for “NVIDIA RTX Virtual Workstation” in /etc/nvidia/gridd.conf with the following command:
Restart nvidia-gridd service to pick up the new license token:
Check for any Error or successful activation:
output:
Verify GPU status with nvidia-smi:
You can also monitor in console the gpu usage with nvtop :

Install the Ollama service

Prerequisites

Download Anaconda package
Install Anaconda package.

Install and configure Ollama service

Download & install ollama

For testing purposes, open a terminal:
1. Second terminal: run the following command:
  1. You can now test by asking questions:
2. Proceed to cancel the terminal.
You can exit the model by typing: control + d

Install & configure Ollama Web UI

Prerequisites

Download and install nvm
Load the environment or execute the command below:
Install nodejs
Install python 3

Install and Configure Ollama Web UI

Download and install ollama-webui:
Create ollama-webui environment file: .env
Install libraries and build the ollama-webui project
Install python3-pip and python3-venv:
Create virtual environment to isolate dependencies:
Install libraries and run the ollama backend
Test the UI application by opening a browser with the your servers ip address and port 8080:
http://XXX.XXX.XXX.XXX:8080
Select a model: llama2 and ask a message:

Troubleshoot

How to verify if the nvidia licenses has been installed properly?
Solution. Run the following command as root:
output:

Reference links

VSS Cloud documentation

Anaconda

Anacoda Repo Archive

nvidia-smi commands

https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf

Ollama

Ollama-webui

nvtop

Linux

Virtualization, Servers and Storage

How-to install Ollama in Ubuntu with vGPU on the ITS Private Cloud

Table of Contents

Steps

Virtual Machine Deployment

NVIDIA Driver and Licensing

Install the Ollama service

Prerequisites

Install & configure Ollama Web UI

Prerequisites

Install and Configure Ollama Web UI

Troubleshoot

Reference links