How-to deploy PrivateGPT in Ubuntu with vGPU on the ITS Private Cloud

In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).

This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.

Virtual Machine Deployment

Download the vss-cli configuration spec: and update the following attributes:
1. machine.folder: target logical folder. List available folders with vss-cli compute folder ls
2. metadata.client : your department client.
3. metadata.inform: email address for automated notifications

Deploy with the following command:

vss-cli --wait compute vm mk from-file ubuntu-llm-privategpt.yaml

Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How to Request a Virtual GPU
```
vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
```
Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.

Power on virtual machine

vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init
```
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
```

Download the NVIDIA drivers from VKSEY-STOR:

scp {vss-user}@vskey-stor.eis.utoronto.ca:ut-vss-lib/nvidia-grid-vpshere-7.0-525.147.01-525.147.05-529.19/Guest_Drivers/nvidia-linux-grid-525_525.147.05_amd64.deb /tmp/

Install the drivers as privileged user:

apt install dkms nvtop
dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb

Create the NVIDIA token file:

echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok

Set permissions to the NVIDIA token:

chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok

Set the FeatureType to 2 for NVIDIA RTX Virtual Workstation in /etc/nvidia/gridd.conf with the following command:
```
 sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
```
Restart nvidia-gridd service to pick up the new license token:
```
systemctl restart nvidia-gridd
```

Check for any Error or successful activation:

journalctl -u nvidia-gridd

output:

Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon.
Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon...
Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon.
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017)
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0)
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized
Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation)
Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT)
Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)

Verify GPU status with nvidia-smi:

user@test:~$ nvidia-smi
Wed Dec 13 14:18:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID P6-16Q         On   | 00000000:02:00.0 Off |                  N/A |
| N/A   N/A    P8    N/A /  N/A |   6426MiB / 16384MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3458      C   ...ld/cuda/bin/ollama-runner     6426MiB |
+-----------------------------------------------------------------------------+

You can also monitor in console the gpu usage with nvtop :

Install PrivateGPT

Dependencies

Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init
```
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
```
Install OS dependencies:
```
sudo apt install build-essential cmake
```

Install python 3.11 either from source or via ppa:deadsnakes/ppa:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11-full python3.11-dev -y

Install NVIDIA CUDA Toolkit. Needed to recompile llama-cpp-python later.

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

Install PrivateGPT

Clone source repository

git clone https://github.com/imartinez/privateGPT

Create and activate virtual environment:

cd privateGPT
python3.10 --version
python3.10 -m venv .venv && source .venv/bin/activate

Install poetry to get all python dependencies installed:

curl -sSL https://install.python-poetry.org | python3 -

Update pip and poetry. Then Install PrivateGPT dependencies:

pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup

Install llama-cpp-python
```
pip install llama-cpp-python
```

Enable GPU support

Export the following environment variables:

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

Reinstall llama-cpp-python:

CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Run PrivateGPT

Run python3.10 -m private_gpt to start:

cd privateGPT &&  source .venv/bin/activate
python3.10 -m private_gpt

Open a web browser with the IP address assigned on port 8001: http://XXX.XXX.XXX.XXX:8001
Upload a few documents and start asking questions:
CleanShot 2024-02-01 at 12.44.24.mp4

How-to deploy PrivateGPT in Ubuntu with vGPU on the ITS Private Cloud

Virtual Machine Deployment

NVIDIA Driver and Licensing

Install PrivateGPT

Dependencies

Install PrivateGPT

Enable GPU support

Run PrivateGPT

\uD83D\uDCCB Related articles