mirror of
https://github.com/zylon-ai/private-gpt.git
synced 2025-12-22 10:45:42 +01:00
* Refactor documentation architecture Split into several `tab` and sections * Fix Fern's docs.yml after PR review Thank you Danny! Co-authored-by: dannysheridan <danny@buildwithfern.com> * Re-add quickstart in the overview tab It went missing after a refactoring of the doc architecture * Documentation writing * Adapt Makefile to fern documentation * Do not create overlapping page names in fern documentation This is causing 500. Thank you to @dsinghvi for the troubleshooting and the help! * Add a readme to help to understand how fern documentation work and how to add new pages * Rework the welcome view Redirects directly users to installation guide with links for people that are not familiar with documentation browsing. * Simplify the quickstart guide * PR feedback on installation guide A ton of refactoring can still be made there * PR feedback on ingestion * PR feedback on ingestion splitting * Rename section on LLM * Fix missing word in list of LLMs --------- Co-authored-by: dannysheridan <danny@buildwithfern.com>
78 lines
No EOL
2.4 KiB
Text
78 lines
No EOL
2.4 KiB
Text
# Ingesting & Managing Documents
|
|
|
|
The ingestion of documents can be done in different ways:
|
|
|
|
* Using the `/ingest` API
|
|
* Using the Gradio UI
|
|
* Using the Bulk Local Ingestion functionality (check next section)
|
|
|
|
## Bulk Local Ingestion
|
|
|
|
When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
|
|
pdf, text files, etc.)
|
|
and optionally watch changes on it with the command:
|
|
|
|
```bash
|
|
make ingest /path/to/folder -- --watch
|
|
```
|
|
|
|
To log the processed and failed files to an additional file, use:
|
|
|
|
```bash
|
|
make ingest /path/to/folder -- --watch --log-file /path/to/log/file.log
|
|
```
|
|
|
|
After ingestion is complete, you should be able to chat with your documents
|
|
by navigating to http://localhost:8001 and using the option `Query documents`,
|
|
or using the completions / chat API.
|
|
|
|
## Ingestion troubleshooting
|
|
|
|
Are you running out of memory when ingesting files?
|
|
|
|
To do not run out of memory, you should ingest your documents without the LLM loaded in your (video) memory.
|
|
To do so, you should change your configuration to set `llm.mode: mock`.
|
|
|
|
In other words, you should update your `settings.yaml` (or your custom configuration file) to set the
|
|
following **before** ingesting your documents:
|
|
```yaml
|
|
llm:
|
|
mode: mock
|
|
```
|
|
|
|
Once your documents are ingested, you can set the `llm.mode` value back to `local` (or your previous custom value).
|
|
|
|
You can also use the existing `PGPT_PROFILES=mock` that will set the `llm.mode` to `mock` for you.
|
|
|
|
## Supported file formats
|
|
|
|
privateGPT by default supports all the file formats that contains clear text (for example, `.txt` files, `.html`, etc.).
|
|
However, these text based file formats as only considered as text files, and are not pre-processed in any other way.
|
|
|
|
It also supports the following file formats:
|
|
* `.hwp`
|
|
* `.pdf`
|
|
* `.docx`
|
|
* `.pptx`
|
|
* `.ppt`
|
|
* `.pptm`
|
|
* `.jpg`
|
|
* `.png`
|
|
* `.jpeg`
|
|
* `.mp3`
|
|
* `.mp4`
|
|
* `.csv`
|
|
* `.epub`
|
|
* `.md`
|
|
* `.mbox`
|
|
* `.ipynb`
|
|
* `.json`
|
|
|
|
**Please note the following nuance**: while `privateGPT` supports these file formats, it **might** require additional
|
|
dependencies to be installed in your python's virtual environment.
|
|
For example, if you try to ingest `.epub` files, `privateGPT` might fail to do it, and will instead display an
|
|
explanatory error asking you to download the necessary dependencies to install this file format.
|
|
|
|
|
|
**Other file formats might work**, but they will be considered as plain text
|
|
files (in other words, they will be ingested as `.txt` files). |