Running HuggingChat locally (VM)

Learn how you can run HuggingChat, an Open Sourced ChatGPT alternative, locally (on a VM) and interact with the Open Assistant model, respectively with any Large Language Model (LLM), in two variants.

Variant 1: Run just the Chat-UI locally and utilize a remote inference endpoint from Hugging Face

Variant 2: Run the whole stack, the Chat-UI, the Text Generation Inference Server and the (Open Assistant) LLM on your Virtual Machine

Installing HuggingChat with the Installation Scripts created in this video

If you want to get the HuggingChat Installation Scripts that we created in the course of this video feel free to purchase and download our HuggingChat Installation Scripts.

Alternatively, if you want to get your hands dirty, you find the scripts at the bottom of this page.

NEW! Installing HuggingChat with aitom8 and the HuggingChat aitom8 plugin

New: In the meanwhile we have created aitom8 which is a professional AI Automation software that automates a variety of open source projects (optionally in virtual environments like conda). For HuggingChat there is an aitom8 plugin available that allows you to install HuggingChat with just one command.

aitom8 huggingchat install

You can get aitom8 and the HuggingChat aitom8 plugin here:

NEW! Code Llama 34B model with Inference and HuggingChat | Local Setup Guide (VM) and Live Demo

New: In this video you can see a variant 3 required for downloading Llama models with your local inference server.

NEW! Talk to your documents with HuggingChat and the aitomChat extension

Learn everything, from Chat UI to Inference and Retrieval Augmented Generation (RAG) in the YouTube video below:

Get aitomChat here:

Installing HuggingChat manually

If you want to get your hands dirty, feel free to set up HuggingChat with the instructions and scripts below.

Prepare your Linux VM

Install Curl:

sudo apt install curl

Install NVM (Node Version manager):

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
nvm -v

Install the latest LTS release of Node.js and npm:

nvm install --lts
node -v
npm -v

Install and run the HuggingChat UI locally

Create new npm project (AI):

mkdir ~/dev/AI
cd ~/dev/AI
npm init

Update package.json:

{
  "name": "ai",
  "version": "1.0.0",
  "description": "Start Apps",
  "main": "index.js",
  "scripts": {
    "start-mongodb": "docker run --rm --name mongodb  -p 27017:27017 -d -v ~/dev/mongo:/data/db mongo",
    "stop-mongodb": "docker stop mongodb",
    "install-chat-ui": "cd ./scripts && ./install-chat-ui.sh",
    "update-chat-ui": "cd ../chat-ui && git pull",
    "start-chat-ui": "cd ../chat-ui && npm run dev -- --host 127.0.0.1",
    "list-mongodb-collections": "docker exec -i mongodb sh -c 'mongosh --eval \"db.getCollectionNames()\" chat-ui'",
    "list-conversations": "docker exec -i mongodb sh -c 'mongosh --eval \"db.conversations.find()\" chat-ui'",
    "drop-database": "docker exec -i mongodb sh -c 'mongosh --eval \"db.dropDatabase()\" chat-ui'",
    "start-inference": "cd ./scripts && ./start-text-generation-inference.sh",
    "show-filesystem": "sudo df -Th && echo && sudo lsblk && echo && docker system df"
  },
  "author": "",
  "license": "ISC"
}

Create scripts directory:

mkdir ~/dev/AI/scripts

Create this script in the scripts directory:

install-chat-ui.sh

#!/usr/bin/env bash
sudo apt-get install git-lfs
sudo rm -R ../../chat-ui
cd ../.. && git clone https://huggingface.co/spaces/huggingchat/chat-ui
cd ./chat-ui && npm install
if [[ -f "../AI/data/chat-ui.env" ]]; then
 cp -v ../AI/data/chat-ui.env .env.local
fi

Make the script executable :

chmod u+x ~/dev/AI/scripts/install-chat-ui.sh

Install the Chat-UI :

npm run install-chat-ui

Copy .env file to .env.local:

cp ~/dev/chat-ui/.env ~/dev/chat-ui/.env.local

Create the MongoDB (with npm and Docker):

npm run start-mongodb

Adapt ~/dev/chat-ui/.env.local file to your needs:

MONGODB_URL=mongodb://localhost:27017/

HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token

Copy your .env.local file as chat-ui.env file into the ~/dev/AI/data directory (to allow fully automated reinstalls):

mkdir ~/dev/AI/data
cp ~/dev/chat-ui/.env.local ~/dev/AI/data/chat-ui.env

Run the Chat-UI:

npm run start-chat-ui

Install and run the Text Generation Inference Server locally

Create this script in the scripts directory:

start-text-generation-inference.sh (Important: if you are not running Nvidia A100 GPU then you need to pass the parameter –disable-custom-kernels )

#model=bigscience/bloom-560m
model=OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
num_shard=2
volume=$PWD/../../inference-data # share a volume with the Docker container to avoid downloading weights every run
name="text-generation-inference"
docker run --rm --name $name --gpus all --shm-size 1g -p 8081:80 \
    -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:latest \
    --model-id $model --num-shard $num_shard \
    --disable-custom-kernels

Make the script executable :

chmod u+x ~/dev/AI/scripts/start-text-generation-inference.sh

Run the Inference Server:

npm run start-inference

Test the Inference Server:

docker exec -it text-generation-inference text-generation-launcher --help
docker exec -it text-generation-inference text-generation-launcher --env
docker exec -it text-generation-inference text-generation-launcher --version

curl 127.0.0.1:8081/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17}}' \
    -H 'Content-Type: application/json'

curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17}}' \
    -H 'Content-Type: application/json'

Add a new model to the MODELS json array in your ~/dev/AI/data/chat-ui.env file:

MODELS=`[{"name": "...", "endpoints": [{"url": "http://127.0.0.1:8081/generate_stream"}]}]`

Re-Install the Chat-UI :

npm run install-chat-ui

Re-Run the Chat-UI:

npm run start-chat-ui

Need further support or consulting?

Please checkout our Consulting hours.

6 thoughts on “Running HuggingChat locally (VM)”

Dan says: June 7, 2023 at 6:28 am

You’re missing a step. After creating the install-chat-ui.sh file you must give it execute permission:
chmod u+x ~/dev/AI/scripts/install-chat-ui.sh

- Robert says: June 7, 2023 at 1:12 pm
  
  Hi Dan, thanks for the heads up! I’ve updated it accordingly.
  
  - Dan says: June 7, 2023 at 2:22 pm
    
    Thanks! Another issue. The last line of install-chat-ui.sh is referencing a directory / file you don’t create until a few steps later (AI/data/chat-ui.env). I guess you’re creating that copy to have it outside of the chat-ui install to persist it? Not clear if it needs to be recopied after the “Re-Install the Chat-UI” step when using your own model.
    
    - Robert says: June 10, 2023 at 3:55 pm
      
      Dan, sorry for late reply. Yes, I create the chat-ui.env file outside of the chat-ui to allow fully automated reinstallations. It automatically gets copied into the chat-ui in the “Re-Install the Chat-UI” step, therefore no manual action is required on your end when reinstalling the chat-ui. However, I now check the existence of the chat-ui.env in the install-chat-ui.sh file (also note the added bash shebang) before copying it and I changed the order of the steps a litte bit to make the process more understandable and immediately applicable.
      
daniel says: September 27, 2023 at 12:27 am

do you need the HF api key even for the local model and inference server ? (i guess not), so it can be run without internet if everything pre-downloaded ?

- Robert says: September 27, 2023 at 7:32 am
  
  You need the HF API key if you want to use the remote HF inference endpoint (Variant 1 in my video).
  If you want to run your own inference server (Variant 2 in my video), then the HF API key is not required (for most models) and the model gets downloaded when your start the server.
  However, for certain models, like Llama 2 from Meta for example, the HF API key is even required with for downloading the model with your inference server (Variant 3), you can see this in my video about running Code Llama locally here: https://youtu.be/mhq6BQX0_P0

Blue Antoinette Premium Products and Global Services