Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

In December 2023, we announced the launch of virtual GPU capabilities on the ITS Private Cloud, as detailed in our blog post ( Introducing Virtual GPUs for Virtual Machines ) and now, we are working on practical examples to harness the power, affordability, security and privacy of the ITS Private Cloud to run Large Language Models (LLMs).

This How-To is focused on deploying a virtual machine running Ubuntu with a 16GB vGPU the vss-cli to host PrivateGPT, an Artificial Intelligence Open Source project that allows you to ask questions about documents using the power of LLMs, without data leaving the runtime environment.

Table of Contents

Virtual Machine Deployment

  1. Deploy a virtual machine from file:

    View file

    Code Block
    vss-cli --wait compute vm mk from-file ubuntu-llm.yaml
  2. Add a virtual GPU of 16GB, specifically the 16q profile. For more information in the profile used, check the following document How to Request a Virtual GPU

    Code Block
    vss-cli compute vm set ubuntu-llm gpu mk --profile 16q
  3. Once the VM has been deployed, a confirmation email will be sent with the assigned IP address and credentials.

  4. Power on virtual machine

    Code Block
    vss-cli compute vm set ubuntu-llm state on

NVIDIA Driver and Licensing

  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    Code Block
    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Download the NVIDIA drivers from VKSEY-STOR:

    Code Block
    scp {vss-user} /tmp/
  3. Install the drivers as privileged user:

    Code Block
    apt install dkms nvtop
    dpkg -i /tmp/nvidia-linux-grid-525_525.147.05_amd64.deb
  4. Create the NVIDIA token file:

    Code Block
    echo -n -e $(vmware-rpctool "info-get guestinfo.ut.vss.nvidia_token") > /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  5. Set permissions to the NVIDIA token:

    Code Block
    chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_12-05-2023-11-26-05.tok
  6. Set the FeatureType to 2 for NVIDIA RTX Virtual Workstation in /etc/nvidia/gridd.conf with the following command:

    Code Block
     sed -i 's/FeatureType=0/FeatureType=2/g' /etc/nvidia/gridd.conf
  7. Restart nvidia-gridd service to pick up the new license token:

    Code Block
    systemctl restart nvidia-gridd
  8. Check for any Error or successful activation:

    Code Block
    journalctl -u nvidia-gridd


    Code Block
    Dec 13 11:23:20 ubu-llm systemd[1]: Stopped NVIDIA Grid Daemon.
    Dec 13 11:23:20 ubu-llm systemd[1]: Starting NVIDIA Grid Daemon...
    Dec 13 11:23:20 ubu-llm systemd[1]: Started NVIDIA Grid Daemon.
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Started (2017)
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: vGPU Software package (0)
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Ignore service provider and node-locked licensing
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: NLS initialized
    Dec 13 11:23:20 ubu-llm nvidia-gridd[2017]: Acquiring license. (Info:; NVIDIA RTX Virtual Workstation)
    Dec 13 11:23:22 ubu-llm nvidia-gridd[2017]: License acquired successfully. (Info:, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 16:23:22 GMT)
    Dec 13 14:59:24 ubu-llm nvidia-gridd[2017]: License renewed successfully. (Info:, NVIDIA RTX Virtual Workstation; Expiry: 2023-12-14 19:59:23 GMT)
  9. Verify GPU status with nvidia-smi:

    Code Block
    user@test:~$ nvidia-smi
    Wed Dec 13 14:18:27 2023
    | NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |   0  GRID P6-16Q         On   | 00000000:02:00.0 Off |                  N/A |
    | N/A   N/A    P8    N/A /  N/A |   6426MiB / 16384MiB |      0%      Default |
    |                               |                      |             Disabled |
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |    0   N/A  N/A      3458      C   ...ld/cuda/bin/ollama-runner     6426MiB |
  10. You can also monitor in console the gpu usage with nvtop:

Install PrivateGPT


  1. Login to the server via ssh. Note that the username may change if you further customized the VM with cloud-init

    Code Block
    ssh -p 2222 vss-admin@XXX.XXX.XXX.XXX
  2. Install python3:

    Code Block
    sudo apt install build-essential cmake python3 python3-dev -y
  3. Install NVIDIA CUDA Toolkit. Needed to recompile llama-cpp-python later.

    Code Block
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo add-apt-repository contrib
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-12-3

Install PrivateGPT

  1. Clone source repository

    Code Block
    git clone
  2. Create and activate virtual environment:

    Code Block
    cd privateGPT
    python3.10 --version
    python3.10 -m venv .venv && source .venv/bin/activate 
  3. Install poetry to get all python dependencies installed:

    Code Block
    curl -sSL | python3 -
  4. Update pip and poetry. Then Install PrivateGPT dependencies:

    Code Block
    pip install --upgrade pip poetry && poetry install --with ui,local && ./scripts/setup
  5. Install llama-cpp-python

    Code Block
    pip install llama-cpp-python

Enable GPU support

  1. Export the following environment variables:

    Code Block
    export CUDA_HOME=/usr/local/cuda
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
    export PATH=$PATH:$CUDA_HOME/bin
  2. Reinstall llama-cpp-python:

    Code Block
    CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Run PrivateGPT

  1. Run python3.10 -m private_gpt to start:

    Code Block
    cd privateGPT &&  source .venv/bin/activate
    python3.10 -m private_gpt
  2. Open a web browser with the IP address assigned on port 8001: http://XXX.XXX.XXX.XXX:8001

  3. Upload a few documents and start asking questions:

    CleanShot 2024-02-01 at 12.44.24.mp4

Filter by label (Content by label)
cqllabel = "kb-how-to-article" and type = "page" and space = "VSSPublic"