Content Comparison

In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).

This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.

Table of Contents

minLevel	1
maxLevel	6
include
outline	false
indent
style	decimal
exclude
type	list
class
printable	false

Virtual Machine Deployment

Deploy a virtual machine from file:
View file
name ubuntu-llm.yaml
Code Block
vss-cli --wait compute vm mk from-file ubuntu-llm.yaml
Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How to Request a Virtual GPU
Code Block
vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.

Power on virtual machine

Code Block
vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init
Code Block
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX

Download the NVIDIA drivers from VKSEY-STOR:

Code Block
scp {vss-user}@vskey-stor.eis.utoronto.ca:ut-vss-lib/nvidia-grid-vpshere-7.0-525.147.01-525.147.05-529.19/Guest_Drivers/nvidia-linux-grid-525_525.147.05_amd64.deb /tmp/

Install the drivers as privileged user:

Code Block
apt install dkms nvtop dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb

Create the NVIDIA token file:

Code Block
echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok

Set permissions to the NVIDIA token:

Code Block
chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok

Set the FeatureType to 2 for NVIDIA RTX Virtual Workstation in /etc/nvidia/gridd.conf with the following command:
Code Block
sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
Restart nvidia-gridd service to pick up the new license token:
Code Block
systemctl restart nvidia-gridd

Check for any Error or successful activation:

Code Block
journalctl -u nvidia-gridd

output:

Code Block

Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon.
Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon...
Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon.
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017)
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0)
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation)
Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT)
Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)

Verify GPU status with nvidia-smi:

Code Block

user@test:~$ nvidia-smi
Wed Dec 13 14:18:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID P6-16Q         On   | 00000000:02:00.0 Off |                  N/A |
| N/A   N/A    P8    N/A /  N/A |   6426MiB / 16384MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3458      C   ...ld/cuda/bin/ollama-runner     6426MiB |
+-----------------------------------------------------------------------------+

You can also monitor in console the gpu usage with nvtop:

Install PrivateGPT

Dependencies

Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init
Code Block
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX

Install python3:

Code Block
sudo apt install build-essential cmake python3 python3-dev -y

Install NVIDIA CUDA Toolkit. Needed to recompile llama-cpp-python later.

Code Block

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

Install PrivateGPT

Clone source repository

Code Block
git clone https://github.com/imartinez/privateGPT

Create and activate virtual environment:

Code Block
cd privateGPT python3.10 --version python3.10 -m venv .venv && source .venv/bin/activate

Install poetry to get all python dependencies installed:
Code Block
curl -sSL https://install.python-poetry.org | python3 -

Update pip and poetry. Then Install PrivateGPT dependencies:

Code Block
pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup

Install llama-cpp-python
Code Block
pip install llama-cpp-python

Enable GPU support

Export the following environment variables:

Code Block
export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 export PATH=$PATH:$CUDA_HOME/bin

Reinstall llama-cpp-python:

Code Block
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Run PrivateGPT

Run python3.10 -m private_gpt to start:

Code Block
cd privateGPT && source .venv/bin/activate python3.10 -m private_gpt

Open a web browser with the IP address assigned on port 8001: http://XXX.XXX.XXX.XXX:8001
Upload a few documents and start asking questions:
CleanShot 2024-02-01 at 12.44.24.mp4

\uD83D\uDCCB Related articles

Filter by label (Content by label)

showLabels	false
max	5
spaces	com.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@32c6ab3f
sort	modified
showSpace	false
reverse	true
type	page
labels	kb-how-to-article
cql	label = "kb-how-to-article" and type = "page" and space = "VSSPublic"

Version	Old Version 1	New Version 2
Changes made by	JM Lopez	JM Lopez
Saved on	Feb 01, 2024	Feb 01, 2024

Versions Compared

Key

Virtual Machine Deployment

NVIDIA Driver and Licensing

Install PrivateGPT

Dependencies

Install PrivateGPT

Enable GPU support

Run PrivateGPT

\uD83D\uDCCB Related articles