feat: Upgrade to LlamaIndex to 0.10 (#1663)

* Extract optional dependencies

* Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity

* Support Ollama embeddings

* Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine

* Fix vector retriever filters
This commit is contained in:
Iván Martínez 2024-03-06 17:51:30 +01:00 committed by GitHub
parent 12f3a39e8a
commit 45f05711eb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
43 changed files with 1474 additions and 1396 deletions

View file

@ -25,6 +25,30 @@ When the server is started it will print a log *Application startup complete*.
Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API
using Swagger UI.
#### Customizing low level parameters
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
##### Available LLM config options
The `llm` section of the settings allows for the following configurations:
- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
Example:
```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
### Using OpenAI
If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may