Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).

This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.

Virtual Machine Deployment

  1. Deploy a virtual machine from file:

    vss-cli --wait compute vm mk from-file ubuntu-llm.yaml
  2. Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How to Request a Virtual GPU

    vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
  3. Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.

  4. Power on virtual machine

    vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Download the NVIDIA drivers from VKSEY-STOR:

    scp {vss-user}@vskey-stor.eis.utoronto.ca:ut-vss-lib/nvidia-grid-vpshere-7.0-525.147.01-525.147.05-529.19/Guest_Drivers/nvidia-linux-grid-525_525.147.05_amd64.deb /tmp/
  3. Install the drivers as privileged user:

    apt install dkms nvtop
    dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb
  4. Create the NVIDIA token file:

    echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  5. Set permissions to the NVIDIA token:

    chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  6. Set the FeatureType to 2 for NVIDIA RTX Virtual Workstation in /etc/nvidia/gridd.conf with the following command:

     sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
  7. Restart nvidia-gridd service to pick up the new license token:

    systemctl restart nvidia-gridd
  8. Check for any Error or successful activation:

    journalctl -u nvidia-gridd

    output:

    Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon.
    Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon...
    Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon.
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017)
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0)
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info: vss-nvidia-ls.eis.utoronto.ca; NVIDIA RTX Virtual Workstation)
    Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT)
    Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info: vss-nvidia-ls.eis.utoronto.ca, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)
  9. Verify GPU status with nvidia-smi:

    user@test:~$ nvidia-smi
    Wed Dec 13 14:18:27 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GRID P6-16Q         On   | 00000000:02:00.0 Off |                  N/A |
    | N/A   N/A    P8    N/A /  N/A |   6426MiB / 16384MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      3458      C   ...ld/cuda/bin/ollama-runner     6426MiB |
    +-----------------------------------------------------------------------------+
    
    
  10. You can also monitor in console the gpu usage with nvtop :

Install PrivateGPT

Dependencies

  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Install OS dependencies:

    sudo apt install build-essential cmake
  3. Install python 3.11 either from source or via ppa:deadsnakes/ppa:

    sudo add-apt-repository ppa:deadsnakes/ppa
    sudo apt update
    sudo apt install python3.11-full python3.11-dev -y
  4. Install NVIDIA CUDA Toolkit. Needed to recompile llama-cpp-python later.

    wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo add-apt-repository contrib
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-12-3

Install PrivateGPT

  1. Clone source repository

    git clone https://github.com/imartinez/privateGPT
  2. Create and activate virtual environment:

    cd privateGPT
    python3.10 --version
    python3.10 -m venv .venv && source .venv/bin/activate 
  3. Install poetry to get all python dependencies installed:

    curl -sSL https://install.python-poetry.org | python3 -
  4. Update pip and poetry. Then Install PrivateGPT dependencies:

    pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
  5. Install llama-cpp-python

    pip install llama-cpp-python

Enable GPU support

  1. Export the following environment variables:

    export CUDA_HOME=/usr/local/cuda
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
    export PATH=$PATH:$CUDA_HOME/bin
  2. Reinstall llama-cpp-python:

    CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Run PrivateGPT

  1. Run python3.10 -m private_gpt to start:

    cd privateGPT &&  source .venv/bin/activate
    python3.10 -m private_gpt
  2. Open a web browser with the IP address assigned on port 8001: http://XXX.XXX.XXX.XXX:8001

  3. Upload a few documents and start asking questions:

    CleanShot 2024-02-01 at 12.44.24.mp4

  • No labels