In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).
This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli
to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.
Table of Contents | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Virtual Machine Deployment
Deploy a virtual machine from file:
View file name ubuntu-llm.yaml Code Block vss-cli --wait compute vm mk from-file ubuntu-llm.yaml
Add a virtual GPU of
16GB
, specifically the16q
profile. For more information in the profile used, check the following document How to Request a Virtual GPUCode Block vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.
Power on virtual machine
Code Block vss-cli compute vm set ubuntu-llm state on
NVIDIA Driver and Licensing
Login to the server via ssh. Note that the username may change if you further customized the VM with
cloud-init
Code Block ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
Download the NVIDIA drivers from VKSEY-STOR:
Code Block scp {vss-user}@vskey-stor.eis.utoronto.ca:ut-vss-lib/nvidia-grid-vpshere-7.0-525.147.01-525.147.05-529.19/Guest_Drivers/nvidia-linux-grid-525_525.147.05_amd64.deb /tmp/
Install the drivers as privileged user:
Code Block apt install dkms nvtop dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb
Create the NVIDIA token file:
Code Block echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
Set permissions to the NVIDIA token:
Code Block chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
Set the
FeatureType
to2
for NVIDIA RTX Virtual Workstation in/etc/nvidia/gridd.conf
with the following command:Code Block sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
Restart
nvidia-gridd
service to pick up the new license token:Code Block systemctl restart nvidia-gridd
Check for any Error or successful activation:
Code Block journalctl -u nvidia-gridd
output:
Code Block Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon... Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation) Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT) Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)
Verify GPU status with
nvidia-smi
:Code Block user@test:~$ nvidia-smi Wed Dec 13 14:18:27 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID P6-16Q On | 00000000:02:00.0 Off | N/A | | N/A N/A P8 N/A / N/A | 6426MiB / 16384MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3458 C ...ld/cuda/bin/ollama-runner 6426MiB | +-----------------------------------------------------------------------------+
You can also monitor in console the gpu usage with
nvtop
:
Install PrivateGPT
Dependencies
Login to the server via ssh. Note that the username may change if you further customized the VM with
cloud-init
Code Block ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
Install
python3
:Code Block sudo apt install build-essential cmake python3 python3-dev -y
Install NVIDIA CUDA Toolkit. Needed to recompile
llama-cpp-python
later.Code Block wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo add-apt-repository contrib sudo apt-get update sudo apt-get -y install cuda-toolkit-12-3
Install PrivateGPT
Clone source repository
Code Block git clone https://github.com/imartinez/privateGPT
Create and activate virtual environment:
Code Block cd privateGPT python3.10 --version python3.10 -m venv .venv && source .venv/bin/activate
Install
poetry
to get all python dependencies installed:Code Block curl -sSL https://install.python-poetry.org | python3 -
Update
pip
andpoetry
. Then Install PrivateGPT dependencies:Code Block pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
Install
llama-cpp-python
Code Block pip install llama-cpp-python
Enable GPU support
Export the following environment variables:
Code Block export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 export PATH=$PATH:$CUDA_HOME/bin
Reinstall
llama-cpp-python
:Code Block CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Run PrivateGPT
Run
python3.10 -m private_gpt
to start:Code Block cd privateGPT && source .venv/bin/activate python3.10 -m private_gpt
Open a web browser with the IP address assigned on port 8001:
http://XXX.XXX.XXX.XXX:8001
Upload a few documents and start asking questions:
\uD83D\uDCCB Related articles
Filter by label (Content by label) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|