In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).
This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli
to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.
Virtual Machine Deployment
Download the
vss-cli
configuration spec: and update the following attributes:machine.folder
: target logical folder. List available folders withvss-cli compute folder ls
metadata.client
: your department client.metadata.inform
: email address for automated notifications
Deploy with the following command:
vss-cli --wait compute vm mk from-file ubuntu-llm-privategpt.yaml
Add a virtual GPU of
16GB
, specifically the16q
profile. For more information in the profile used, check the following document How to Request a Virtual GPUvss-cli compute vm set ubuntu-llm gpu mk --profile 16q
Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.
Power on virtual machine
vss-cli compute vm set ubuntu-llm state on
NVIDIA Driver and Licensing
Login to the server via ssh. Note that the username may change if you further customized the VM with
cloud-init
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
Download the NVIDIA drivers from VKSEY-STOR:
scp {vss-user}@vskey-stor.eis.utoronto.ca:ut-vss-lib/nvidia-grid-vpshere-7.0-525.147.01-525.147.05-529.19/Guest_Drivers/nvidia-linux-grid-525_525.147.05_amd64.deb /tmp/
Install the drivers as privileged user:
apt install dkms nvtop dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb
Create the NVIDIA token file:
echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
Set permissions to the NVIDIA token:
chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
Set the
FeatureType
to2
for NVIDIA RTX Virtual Workstation in/etc/nvidia/gridd.conf
with the following command:sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
Restart
nvidia-gridd
service to pick up the new license token:systemctl restart nvidia-gridd
Check for any Error or successful activation:
journalctl -u nvidia-gridd
output:
Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon... Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation) Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT) Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)
Verify GPU status with
nvidia-smi
:user@test:~$ nvidia-smi Wed Dec 13 14:18:27 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID P6-16Q On | 00000000:02:00.0 Off | N/A | | N/A N/A P8 N/A / N/A | 6426MiB / 16384MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3458 C ...ld/cuda/bin/ollama-runner 6426MiB | +-----------------------------------------------------------------------------+
You can also monitor in console the gpu usage with
nvtop
:
Install PrivateGPT
Dependencies
Login to the server via ssh. Note that the username may change if you further customized the VM with
cloud-init
ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
Install OS dependencies:
sudo apt install build-essential cmake
Install
python 3.11
either from source or viappa:deadsnakes/ppa
:sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.11-full python3.11-dev -y
Install NVIDIA CUDA Toolkit. Needed to recompile
llama-cpp-python
later.wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo add-apt-repository contrib sudo apt-get update sudo apt-get -y install cuda-toolkit-12-3
Install PrivateGPT
Clone source repository
git clone https://github.com/imartinez/privateGPT
Create and activate virtual environment:
cd privateGPT python3.10 --version python3.10 -m venv .venv && source .venv/bin/activate
Install
poetry
to get all python dependencies installed:curl -sSL https://install.python-poetry.org | python3 -
Update
pip
andpoetry
. Then Install PrivateGPT dependencies:pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
Install
llama-cpp-python
pip install llama-cpp-python
Enable GPU support
Export the following environment variables:
export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 export PATH=$PATH:$CUDA_HOME/bin
Reinstall
llama-cpp-python
:CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Run PrivateGPT
Run
python3.10 -m private_gpt
to start:cd privateGPT && source .venv/bin/activate python3.10 -m private_gpt
Open a web browser with the IP address assigned on port 8001:
http://XXX.XXX.XXX.XXX:8001
Upload a few documents and start asking questions: