Support n_batch to improve inference performance

2025-12-22 04:30:11 +01:00 · 2023-06-11 21:33:35 +02:00 · 2023-06-11 21:33:35 +02:00 · ad661933cb
commit ad661933cb
parent 52eb020256
3 changed files with 5 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -26,6 +26,7 @@ MODEL_TYPE: supports LlamaCpp or GPT4All
 PERSIST_DIRECTORY: is the folder you want your vectorstore in
 MODEL_PATH: Path to your GPT4All or LlamaCpp supported LLM
 MODEL_N_CTX: Maximum token limit for the LLM model
+MODEL_N_BATCH: Number of tokens in the prompt that are fed into the model at a time. Optimal value differs a lot depending on the model (8 works well for GPT4All, and 1024 is better for LlamaCpp)
 EMBEDDINGS_MODEL_NAME: SentenceTransformers embeddings model name (see https://www.sbert.net/docs/pretrained_models.html)
 TARGET_SOURCE_CHUNKS: The amount of chunks (sources) that will be used to answer a question
 ```