mirror of
https://github.com/zylon-ai/private-gpt.git
synced 2025-12-22 17:05:41 +01:00
feat(ingest): Created a faster ingestion mode - pipeline (#1750)
* Unify pgvector and postgres connection settings * Remove local changes * Update file pgvector->postgres * postgresql should be postgres * Adding pipeline ingestion mode * disable hugging face parallelism. Continue on file to doc transform failure * Semaphore to limit docq async workers. ETA reporting
This commit is contained in:
parent
1efac6a3fe
commit
134fc54d7d
5 changed files with 301 additions and 2 deletions
|
|
@ -62,6 +62,7 @@ The following ingestion mode exist:
|
|||
* `simple`: historic behavior, ingest one document at a time, sequentially
|
||||
* `batch`: read, parse, and embed multiple documents using batches (batch read, and then batch parse, and then batch embed)
|
||||
* `parallel`: read, parse, and embed multiple documents in parallel. This is the fastest ingestion mode for local setup.
|
||||
* `pipeline`: Alternative to parallel.
|
||||
To change the ingestion mode, you can use the `embedding.ingest_mode` configuration value. The default value is `simple`.
|
||||
|
||||
To configure the number of workers used for parallel or batched ingestion, you can use
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue