mirror of
https://github.com/zylon-ai/private-gpt.git
synced 2025-12-22 10:45:42 +01:00
377 lines
16 KiB
Text
377 lines
16 KiB
Text
It is important that you review the Main Concepts before you start the installation process.
|
|
|
|
## Base requirements to run PrivateGPT
|
|
|
|
* Clone PrivateGPT repository, and navigate to it:
|
|
|
|
```bash
|
|
git clone https://github.com/imartinez/privateGPT
|
|
cd privateGPT
|
|
```
|
|
|
|
* Install Python `3.11` (*if you do not have it already*). Ideally through a python version manager like `pyenv`.
|
|
Earlier python versions are not supported.
|
|
* osx/linux: [pyenv](https://github.com/pyenv/pyenv)
|
|
* windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)
|
|
|
|
```bash
|
|
pyenv install 3.11
|
|
pyenv local 3.11
|
|
```
|
|
|
|
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
|
|
|
|
* Install `make` to be able to run the different scripts:
|
|
* osx: (Using homebrew): `brew install make`
|
|
* windows: (Using chocolatey) `choco install make`
|
|
|
|
## Install and run your desired setup
|
|
|
|
PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
|
|
Here are the different options available:
|
|
|
|
- LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike"
|
|
- Embeddings: "huggingface", "openai", "sagemaker"
|
|
- Vector stores: "qdrant", "chroma", "postgres"
|
|
- UI: whether or not to enable UI (Gradio) or just go with the API
|
|
|
|
In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:
|
|
|
|
```bash
|
|
poetry install --extras "<extra1> <extra2>..."
|
|
```
|
|
|
|
Where `<extra>` can be any of the following:
|
|
|
|
- ui: adds support for UI using Gradio
|
|
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
|
|
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
|
|
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
|
|
- llms-nvidia-tensorrt: add support for Nvidia TensorRT LLM
|
|
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
|
|
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
|
|
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
|
|
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
|
|
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
|
|
- vector-stores-qdrant: adds support for Qdrant vector store
|
|
- vector-stores-chroma: adds support for Chroma DB vector store
|
|
- vector-stores-postgres: adds support for Postgres vector store
|
|
|
|
## Recommended Setups
|
|
|
|
There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
|
|
You'll find more information in the Manual section of the documentation.
|
|
|
|
> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
|
|
If you are using Windows, you'll need to set the env var in a different way, for example:
|
|
|
|
```powershell
|
|
# Powershell
|
|
$env:PGPT_PROFILES="ollama"
|
|
make run
|
|
```
|
|
|
|
or
|
|
|
|
```cmd
|
|
# CMD
|
|
set PGPT_PROFILES=ollama
|
|
make run
|
|
```
|
|
|
|
### Local, Ollama-powered setup
|
|
|
|
The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.
|
|
|
|
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
|
|
|
|
Once done, you can install PrivateGPT dependencies with the following command:
|
|
```bash
|
|
poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
|
|
```
|
|
|
|
We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
|
|
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
|
|
```bash
|
|
poetry run python scripts/setup
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
|
|
|
|
```bash
|
|
PGPT_PROFILES=ollama make run
|
|
```
|
|
|
|
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
### Private, Sagemaker-powered setup
|
|
|
|
If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
|
|
|
|
You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
|
|
|
|
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
|
|
|
|
Then, install PrivateGPT dependencies with the following command:
|
|
```bash
|
|
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
|
|
|
|
```bash
|
|
PGPT_PROFILES=sagemaker make run
|
|
```
|
|
|
|
PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
### Local, TensorRT-powered setup
|
|
|
|
To get the most out of NVIDIA GPUs, you can set up a fully local PrivateGPT using TensorRT as its LLM provider. For more information about Nvidia TensorRT, check the [official documentation](https://github.com/NVIDIA/TensorRT-LLM).
|
|
|
|
Follow these steps to set up a local TensorRT-powered PrivateGPT:
|
|
|
|
- Nvidia Cuda 12.2 or higher is currently required to run TensorRT-LLM.
|
|
|
|
- For this example we will use Llama2. The Llama2 model files need to be created via scripts following the instructions [here](https://github.com/NVIDIA/trt-llm-rag-windows/blob/release/1.0/README.md#building-trt-engine).
|
|
The following files will be created from following the steps in the link:
|
|
|
|
* `Llama_float16_tp1_rank0.engine`: The main output of the build script, containing the executable graph of operations with the model weights embedded.
|
|
|
|
* `config.jsonp`: Includes detailed information about the model, like its general structure and precision, as well as information about which plug-ins were incorporated into the engine.
|
|
|
|
* `model.cache`: Caches some of the timing and optimization information from model compilation, making successive builds quicker.
|
|
|
|
- Create a folder inside `models` called `tensorrt`, and move all of the files mentioned above to that directory.
|
|
|
|
Once done, you can install PrivateGPT dependencies with the following command:
|
|
```bash
|
|
poetry install --extras "ui llms-nvidia-tensorrt embeddings-huggingface vector-stores-qdrant"
|
|
```
|
|
|
|
We are installing "embeddings-huggingface" dependency to support local embeddings, because TensorRT only covers the LLM.
|
|
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
|
|
```bash
|
|
poetry run python scripts/setup
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT.
|
|
|
|
```bash
|
|
PGPT_PROFILES=tensorrt make run
|
|
```
|
|
|
|
PrivateGPT will use the already existing `settings-tensorrt.yaml` settings file, which is already configured to use Nvidia TensorRT LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, etc.)
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
### Local, Llama-CPP powered setup
|
|
|
|
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
|
|
|
|
```bash
|
|
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
|
|
```
|
|
|
|
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
|
|
```bash
|
|
poetry run python scripts/setup
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT with the following command:
|
|
|
|
```bash
|
|
PGPT_PROFILES=local make run
|
|
```
|
|
|
|
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
### Non-Private, OpenAI-powered test setup
|
|
|
|
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
|
|
|
|
You need an OPENAI API key to run this setup.
|
|
|
|
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
|
|
|
|
Then, install PrivateGPT dependencies with the following command:
|
|
```bash
|
|
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT.
|
|
|
|
```bash
|
|
PGPT_PROFILES=openai make run
|
|
```
|
|
|
|
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
### Local, Llama-CPP powered setup
|
|
|
|
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command to install its dependencies:
|
|
|
|
```bash
|
|
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
|
|
```
|
|
|
|
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
|
|
```bash
|
|
poetry run python scripts/setup
|
|
```
|
|
|
|
Once installed, you can run PrivateGPT with the following command:
|
|
|
|
```bash
|
|
PGPT_PROFILES=local make run
|
|
```
|
|
|
|
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
|
|
|
|
The UI will be available at http://localhost:8001
|
|
|
|
#### Llama-CPP support
|
|
|
|
For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
|
|
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
|
is used.
|
|
|
|
You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
|
|
|
|
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
|
|
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
|
|
|
|
##### Llama-CPP OSX GPU support
|
|
|
|
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.
|
|
|
|
To do that, you need to install `llama.cpp` python's binding `llama-cpp-python` through pip, with the compilation flag
|
|
that activate `METAL`: you have to pass `-DLLAMA_METAL=on` to the CMake command tha `pip` runs for you (see below).
|
|
|
|
In other words, one should simply run:
|
|
```bash
|
|
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
|
|
```
|
|
|
|
The above command will force the re-installation of `llama-cpp-python` with `METAL` support by compiling
|
|
`llama.cpp` locally with your `METAL` libraries (shipped by default with your macOS).
|
|
|
|
More information is available in the documentation of the libraries themselves:
|
|
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration)
|
|
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
|
|
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)
|
|
|
|
##### Llama-CPP Windows NVIDIA GPU support
|
|
|
|
Windows GPU support is done through CUDA.
|
|
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
|
|
dependencies.
|
|
|
|
Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):
|
|
|
|
* Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
|
|
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
|
|
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
|
|
date and your GPU is detected.
|
|
* [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/
|
|
|
|
If you have all required dependencies properly configured running the
|
|
following powershell command should succeed.
|
|
|
|
```powershell
|
|
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
|
|
```
|
|
|
|
If your installation was correct, you should see a message similar to the following next
|
|
time you start the server `BLAS = 1`.
|
|
|
|
```console
|
|
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
|
|
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
|
|
```
|
|
|
|
Note that llama.cpp offloads matrix calculations to the GPU but the performance is
|
|
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
|
|
batch sizes and other parameters to get the best performance for your particular system.
|
|
|
|
##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL
|
|
|
|
Linux GPU support is done through CUDA.
|
|
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
|
|
external
|
|
dependencies.
|
|
|
|
Some tips:
|
|
|
|
* Make sure you have an up-to-date C++ compiler
|
|
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
|
|
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
|
|
date and your GPU is detected.
|
|
|
|
After that running the following command in the repository will install llama.cpp with GPU support:
|
|
|
|
```bash
|
|
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
|
|
```
|
|
|
|
If your installation was correct, you should see a message similar to the following next
|
|
time you start the server `BLAS = 1`.
|
|
|
|
```
|
|
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
|
|
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
|
|
```
|
|
|
|
##### Llama-CPP Known issues and Troubleshooting
|
|
|
|
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
|
|
You might encounter several issues:
|
|
|
|
* Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
|
|
* GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
|
|
the host.
|
|
* Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
|
|
Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
|
|
If you encounter any of these issues, please open an issue and we'll try to help.
|
|
|
|
One of the first reflex to adopt is: get more information.
|
|
If, during your installation, something does not go as planned, retry in *verbose* mode, and see what goes wrong.
|
|
|
|
For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.
|
|
|
|
##### Llama-CPP Troubleshooting: C++ Compiler
|
|
|
|
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
|
|
compiler on your computer.
|
|
|
|
**For Windows 10/11**
|
|
|
|
To install a C++ compiler on Windows 10/11, follow these steps:
|
|
|
|
1. Install Visual Studio 2022.
|
|
2. Make sure the following components are selected:
|
|
* Universal Windows Platform development
|
|
* C++ CMake tools for Windows
|
|
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
|
|
4. Run the installer and select the `gcc` component.
|
|
|
|
**For OSX**
|
|
|
|
1. Check if you have a C++ compiler installed, `Xcode` should have done it for you. To install Xcode, go to the App
|
|
Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
|
|
2. If not, you can install clang or gcc with homebrew `brew install gcc`
|
|
|
|
##### Llama-CPP Troubleshooting: Mac Running Intel
|
|
|
|
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
|
|
-march=native'_ during pip install.
|
|
|
|
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
|