How-to deploy PrivateGPT in Ubuntu with vGPU on the ITS Private Cloud

In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).

This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.

 

Virtual Machine Deployment

  1. Download the vss-cli configuration spec: and update the following attributes:

    1. machine.folder: target logical folder. List available folders with vss-cli compute folder ls

    2. metadata.client : your department client.

    3. metadata.inform: email address for automated notifications

  2. Deploy with the following command:

    vss-cli --wait compute vm mk from-file ubuntu-llm-privategpt.yaml
  3. Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How-to Request a Virtual GPU

    vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
  4. Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.

  5. Power on virtual machine

    vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

  1.  

NVIDIA Driver and Licensing

Current supported driver is nvidia-linux-grid-535_535.183.01_amd64.deb

  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Download the NVIDIA drivers from VKSEY-STOR:

    scp {vss-user}@vskey-stor.eis.utoronto.ca:/ut-vss-lib/nvidia-grid-vsphere-7.0-535.183.04-535.183.01-538.67/Guest_Drivers/nvidia-linux-grid-535_535.183.01_amd64.deb /tmp/
  3. Install the drivers as privileged user:

    apt install dkms nvtop apt install /tmp/nvidia-linux-grid-535_535.183.01_amd64.deb
  4. Create directory: ClientConfigToken

    mkdir /etc/nvidia/ClientConfigToken
  5. Create the NVIDIA token file:

    echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  6. Set permissions to the NVIDIA token:

    chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  7. Set the FeatureType to 2 for “NVIDIA RTX Virtual Workstation” in /etc/nvidia/gridd.conf with the following command:

    sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
  8. Restart nvidia-gridd service to pick up the new license token:

    systemctl restart nvidia-gridd
  9. Check for any Error or successful activation:

    journalctl -u nvidia-gridd

    output:

    Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon... Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon. Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0) Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation) Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT) Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)
  10. Verify GPU status with nvidia-smi:

    user@test:~$ nvidia-smi Wed Dec 13 14:18:27 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID P6-16Q On | 00000000:02:00.0 Off | N/A | | N/A N/A P8 N/A / N/A | 6426MiB / 16384MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3458 C ...ld/cuda/bin/ollama-runner 6426MiB | +-----------------------------------------------------------------------------+
  11. You can also monitor in console the gpu usage with nvtop :

Install PrivateGPT

Dependencies

  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Install OS dependencies:

    sudo apt install build-essential cmake
  3. Install python 3.11 either from source or via ppa:deadsnakes/ppa:

    sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.11-full python3.11-dev -y
  4. Install NVIDIA CUDA Toolkit. Needed to recompile llama-cpp-python later.

    wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo add-apt-repository contrib sudo apt-get update sudo apt-get -y install cuda-toolkit-12-3

Install PrivateGPT

  1. Clone source repository

    git clone https://github.com/imartinez/privateGPT
  2. Create and activate virtual environment:

    cd privateGPT python3.10 --version python3.10 -m venv .venv && source .venv/bin/activate
  3. Install poetry to get all python dependencies installed:

    curl -sSL https://install.python-poetry.org | python3 -
  4. Update pip and poetry. Then Install PrivateGPT dependencies:

    pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
  5. Install llama-cpp-python

    pip install llama-cpp-python

Enable GPU support

  1. Export the following environment variables:

    export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 export PATH=$PATH:$CUDA_HOME/bin
  2. Reinstall llama-cpp-python:

    CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

 

Run PrivateGPT

  1. Run python3.10 -m private_gpt to start:

    cd privateGPT && source .venv/bin/activate python3.10 -m private_gpt
  2. Open a web browser with the IP address assigned on port 8001: http://XXX.XXX.XXX.XXX:8001

  3. Upload a few documents and start asking questions:

    CleanShot 2024-02-01 at 12.44.24.mp4

 

 Related articles

Related content

University of Toronto - Since 1827