Draft: Resolve "Llama3.2 Konfigurationstest"
Compare changes
- Erik Jonas Hartnick authored
Conflict: This file was modified in both the source and target branches.
Ask someone with write access to resolve it.
+ 94
− 2
@@ -115,12 +115,104 @@ OLLAMA_MODELS=<model-path> ; <install-path>/./bin/ollama serve &
5. Ollama is now ready to recieve requests to `<model>` by CHESS. We can also start a chat session with `ollama run <model>` to check that everything works. To stop the web service, get the job to the foreground with `fg`, then stop with `Ctrl` + `C`. To restart the web service, simply run (only) step 3 again. Remove/Uninstall with `rm <download-path>/ollama-linux-amd64.tgz`, `rm -r <install-path>/*` and `rm -r <model-path>/*`.
The CHESS framework uses the [langchain python package](https://python.langchain.com/docs/introduction/) to connect to a LLM via the API of a web service. The Ollama integration for langchain was added to the `requirements.txt` file as `langchain_ollama==0.1.3` (version 0.1.3 because of its compatiblity with the existing requirements).
The preprocessing calls the LLM for embedding of the database and column descriptions, thus the file [CHESS/src/database_utils/db_catalog/preprocessing.py](CHESS/src/database_utils/db_catalog/preprocessing.py) was edited, adding the import `from langchain_ollama import OllamaEmbeddings` and commenting out the existing `EMBEDDING_FUNCTION` to replace it with `EMBEDDING_FUNCTION = OllamaEmbeddings(model="llama3.2")`.
- The `num_ctx` is the context used by the model. Ollama defaults to a context size of 2048 tokens. We observed context sizes of about 15 000 tokens in the warnings from Ollama, Therefore, we set a context of about twice that. Note that Llama3.2 allows for a context size of up to 128 000, where as Llama3-70B only allows for a context size of 8192 tokens. Check with the model you would like to run with Ollama.
To configure the agents, a `.yaml` configuration file and a shell script are needed. For testing purposes, the authors of the replication copied the [CHESS/run/configs/CHESS_IR_CG_UT.yaml](CHESS/run/configs/CHESS_IR_CG_UT.yaml) config file to [CHESS/run/configs/CHESS_IR_CG_UT_LLAMA3-2.yaml](CHESS/run/configs/CHESS_IR_CG_UT_LLAMA3-2.yaml) and the [CHESS/run/configs/CHESS_IR_SS_CG.yaml](CHESS/run/configs/CHESS_IR_SS_CG.yaml) config file to [CHESS/run/configs/CHESS_IR_SS_CG_LLAMA3-2.yaml](CHESS/run/configs/CHESS_IR_SS_CG_LLAMA3-2.yaml) and replaced every `engine` and `engine_name` config with the `meta-llama/llama3-2` model as configured above.
Similarly, the shell scripts to run the agents were copied for testing purposes by the authors of the replication, copying [CHESS/run/run_main_ir_cg_ut.sh](CHESS/run/run_main_ir_cg_ut.sh) to [CHESS/run/run_main_ig_cg_ut_llama3.2.sh](CHESS/run/run_main_ig_cg_ut_llama3.2.sh) and [CHESS/run/run_main_ir_ss_cg.sh](CHESS/run/run_main_ir_ss_cg.sh) to [CHESS/run/run_main_ir_ss_cg_llama3.2.sh](CHESS/run/run_main_ir_ss_cg_llama3.2.sh) in `CHESS/run`. The `config` variable was changed to the appropriate path of the agent configuration file:
In the information retriever agent (IR) there is another call to the `embed`-API that is not covered by the config in the previous steps. In the `retrieve_entity` tool, the in the file [CHESS/src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py](CHESS/src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py), the replication authors added the import `from langchain_ollama import OllamaEmbeddings` and the property `embedding_function` of class `RetrieveEntity` (line 34, `self.embedding_function = OpenAIEmbeddings(model="text-embedding-3-small")`) was adapted to the `OllamaEmbeddings`: `self.embedding_function = OllamaEmbeddings(model="llama3.2")`