Refactor documentation architecture (#1264)

* Refactor documentation architecture Split into several `tab` and sections * Fix Fern's docs.yml after PR review Thank you Danny! Co-authored-by: dannysheridan <danny@buildwithfern.com> * Re-add quickstart in the overview tab It went missing after a refactoring of the doc architecture * Documentation writing * Adapt Makefile to fern documentation * Do not create overlapping page names in fern documentation This is causing 500. Thank you to @dsinghvi for the troubleshooting and the help! * Add a readme to help to understand how fern documentation work and how to add new pages * Rework the welcome view Redirects directly users to installation guide with links for people that are not familiar with documentation browsing. * Simplify the quickstart guide * PR feedback on installation guide A ton of refactoring can still be made there * PR feedback on ingestion * PR feedback on ingestion splitting * Rename section on LLM * Fix missing word in list of LLMs --------- Co-authored-by: dannysheridan <danny@buildwithfern.com>
2025-12-22 10:45:42 +01:00 · 2023-11-19 18:46:09 +01:00 · 2023-11-19 18:46:09 +01:00 · 36f69eed0f
commit 36f69eed0f
parent 57a829a8e8
18 changed files with 399 additions and 151 deletions
--- a/fern/docs/pages/manual/ingestion.mdx
+++ b/fern/docs/pages/manual/ingestion.mdx
@ -0,0 +1,78 @@
+# Ingesting & Managing Documents
+
+The ingestion of documents can be done in different ways:
+
+* Using the `/ingest` API
+* Using the Gradio UI
+* Using the Bulk Local Ingestion functionality (check next section)
+
+## Bulk Local Ingestion
+
+When you are running PrivateGPT in a fully local setup, you can ingest a complete folder for convenience (containing
+pdf, text files, etc.)
+and optionally watch changes on it with the command:
+
+```bash
+make ingest /path/to/folder -- --watch
+```
+
+To log the processed and failed files to an additional file, use:
+
+```bash
+make ingest /path/to/folder -- --watch --log-file /path/to/log/file.log
+```
+
+After ingestion is complete, you should be able to chat with your documents
+by navigating to http://localhost:8001 and using the option `Query documents`,
+or using the completions / chat API.
+
+## Ingestion troubleshooting
+
+Are you running out of memory when ingesting files?
+
+To do not run out of memory, you should ingest your documents without the LLM loaded in your (video) memory.
+To do so, you should change your configuration to set `llm.mode: mock`.
+
+In other words, you should update your `settings.yaml` (or your custom configuration file) to set the
+following **before** ingesting your documents:
+```yaml
+llm:
+  mode: mock
+```
+
+Once your documents are ingested, you can set the `llm.mode` value back to `local` (or your previous custom value).
+
+You can also use the existing `PGPT_PROFILES=mock` that will set the `llm.mode` to `mock` for you.
+
+## Supported file formats
+
+privateGPT by default supports all the file formats that contains clear text (for example, `.txt` files, `.html`, etc.).
+However, these text based file formats as only considered as text files, and are not pre-processed in any other way.
+
+It also supports the following file formats:
+* `.hwp`
+* `.pdf`
+* `.docx`
+* `.pptx`
+* `.ppt`
+* `.pptm`
+* `.jpg`
+* `.png`
+* `.jpeg`
+* `.mp3`
+* `.mp4`
+* `.csv`
+* `.epub`
+* `.md`
+* `.mbox`
+* `.ipynb`
+* `.json`
+
+**Please note the following nuance**: while `privateGPT` supports these file formats, it **might** require additional
+dependencies to be installed in your python's virtual environment.
+For example, if you try to ingest `.epub` files, `privateGPT` might fail to do it, and will instead display an
+explanatory error asking you to download the necessary dependencies to install this file format.
+
+
+**Other file formats might work**, but they will be considered as plain text
+files (in other words, they will be ingested as `.txt` files).