mirror of
https://github.com/zylon-ai/private-gpt.git
synced 2025-12-22 10:45:42 +01:00
More loaders, generic method
- Update the README with extra formats - Add Powerpoint, requested in #138 - Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535 - Update requirements
This commit is contained in:
parent
23d24c88e9
commit
bf3bddfbb6
3 changed files with 70 additions and 16 deletions
21
README.md
21
README.md
|
|
@ -31,7 +31,22 @@ This repo uses a [state of the union transcript](https://github.com/imartinez/pr
|
|||
|
||||
## Instructions for ingesting your own dataset
|
||||
|
||||
Put any and all of your .txt, .pdf, or .csv files into the source_documents directory
|
||||
Put any and all your files into the `source_documents` directory
|
||||
|
||||
The supported extensions are:
|
||||
|
||||
- `.csv`: CSV,
|
||||
- `.docx`: Word Document,
|
||||
- `.enex`: EverNote,
|
||||
- `.eml`: Email,
|
||||
- `.epub`: EPub,
|
||||
- `.html`: HTML File,
|
||||
- `.md`: Markdown,
|
||||
- `.msg`: Outlook Message,
|
||||
- `.odt`: Open Document Text,
|
||||
- `.pdf`: Portable Document Format (PDF),
|
||||
- `.pptx` : PowerPoint Document,
|
||||
- `.txt`: Text file (UTF-8),
|
||||
|
||||
Run the following command to ingest all the data.
|
||||
|
||||
|
|
@ -54,7 +69,7 @@ python privateGPT.py
|
|||
|
||||
And wait for the script to require your input.
|
||||
|
||||
```shell
|
||||
```plaintext
|
||||
> Enter a query:
|
||||
```
|
||||
|
||||
|
|
@ -87,7 +102,7 @@ To install a C++ compiler on Windows 10/11, follow these steps:
|
|||
* Universal Windows Platform development
|
||||
* C++ CMake tools for Windows
|
||||
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
|
||||
4. Run the installer and select the "gcc" component.
|
||||
4. Run the installer and select the `gcc` component.
|
||||
|
||||
# Disclaimer
|
||||
This is a test project to validate the feasibility of a fully private solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. The models selection is not optimized for performance, but for privacy; but it is possible to use different models and vectorstores to improve performance.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue