Added max_new_tokens as a config option to llm yaml block (#1317)

* added max_new_tokens as a configuration option to the llm block in settings * Update fern/docs/pages/manual/settings.mdx Co-authored-by: lopagela <lpglm@orange.fr> * Update private_gpt/settings/settings.py Add default value for max_new_tokens = 256 Co-authored-by: lopagela <lpglm@orange.fr> * Addressed location of docs comment * reformatting from running 'make check' * remove default config value from settings.yaml --------- Co-authored-by: lopagela <lpglm@orange.fr>
2025-12-22 10:45:42 +01:00 · 2023-11-26 19:17:29 +01:00 · 2023-11-26 19:17:29 +01:00 · 9c192ddd73
commit 9c192ddd73
parent baf29f06fa
3 changed files with 20 additions and 0 deletions
--- a/fern/docs/pages/installation/installation.mdx
+++ b/fern/docs/pages/installation/installation.mdx
@ -89,6 +89,21 @@ Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are avai
 In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
 these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.

+##### Available LLM config options
+
+The `llm` section of the settings allows for the following configurations:
+
+- `mode`: how to run your llm
+- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
+
+Example:
+
+```yaml
+llm:
+  mode: local
+  max_new_tokens: 256
+```
+
 If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
 recommended models, instead of custom tuning the parameters.