5. Ollama is now ready to recieve requests to `<model>` by CHESS. We can also start a chat session with `ollama run <model>` to check that everything works. To stop the web service, get the job to the foreground with `fg`, then stop with `Ctrl` + `C`. To restart the web service, simply run (only) step 3 again. Remove/Uninstall with `rm <download-path>/ollama-linux-amd64.tgz`, `rm -r <install-path>/*` and `rm -r <model-path>/*`.
## Setting up the venv
1. We change to the `CHESS` directory with `cd CHESS`. There we create a python virtual environment named `venv` with the command (using the appropriate python version on our system):
```bash
python -m venv venv
```
2. We activate the environment with the following command, adding `(venv) ` as a prefix to our command prompt
```bash
source venv/bin/activate
```
3. We install the required python packages with:
```bash
pip install -r requirements.txt
```
4. We are ready to execute any further steps below. Once we're finished, we can deactivate the virtual environment (which removes the `(venv )` prefix) by running:
```bash
deactivate
```
## Configuring the preprocessing
- TODO
The CHESS framework uses the [langchain python package](https://python.langchain.com/docs/introduction/) to connect to a LLM via the API of a web service. The Ollama integration for langchain was added to the `requirements.txt` file as `langchain_ollama==0.1.3` (version 0.1.3 because of its compatiblity with the existing requirements).
The preprocessing calls the LLM for embedding of the database and column descriptions, thus the file [CHESS/src/database_utils/db_catalog/preprocessing.py](CHESS/src/database_utils/db_catalog/preprocessing.py) was edited, adding the import `from langchain_ollama import OllamaEmbeddings` and commenting out the existing `EMBEDDING_FUNCTION` to replace it with `EMBEDDING_FUNCTION = OllamaEmbeddings(model="llama3.2")`.
1. To use a different model with Ollama for embedding, the Parameter `model=llama3.2` must be edited to `model=<model>`.
2. Ensure that the Ollama web service is running (`OLLAMA_MODELS=<model-path> ; ollama/bin/./ollama serve &`)
3. Run the preprocessing by changing into the `CHESS` directory with `cd CHESS`, assuming you are in the replication's repository root. Then run:
```bash
/run/./run_preprocess.sh
```
## Configuring the agents
- TODO
To configure the model for the agents, we need to add a new engine/model configuration to [CHESS/src/llm/engine_configs.py](CHESS/src/llm/engine_configs.py). For Llama3.2 with Ollama, we added the following configuration to the `ENGINE_CONFIGS`:
```python
"meta-llama/llama3-2": {
"constructor": ChatOllama,
"params": {
"model": "llama3.2",
"temperature": 0,
"model_kwargs": {
"stop": [""],
},
"num_ctx": 32768
}
}
```
- The `constructor` will be the langchain constructor `ChatOllama` for API calls to the LLM.
- The `params` are the constructor parameters
- The `model` is the model used by Ollama
- The `temperature` is the default model temperature. This gets overwritten by the config of the agents.
- The `model_kwargs` are copied from the existing Llama config, including the `stop` entry.
- The `num_ctx` is the context used by the model. Ollama defaults to a context size of 2048 tokens. We observed context sizes of about 15 000 tokens in the warnings from Ollama, Therefore, we set a context of about twice that. Note that Llama3.2 allows for a context size of up to 128 000, where as Llama3-70B only allows for a context size of 8192 tokens. Check with the model you would like to run with Ollama.
- For any other parameters, check the [langchain_ollama documentation](https://python.langchain.com/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html#langchain_ollama.chat_models.ChatOllama) on `ChatOllama`
1. Adding another model for Ollama will be configured similarly to the above config.
To configure the agents, a `.yaml` configuration file and a shell script are needed. For testing purposes, the authors of the replication copied the [CHESS/run/configs/CHESS_IR_CG_UT.yaml](CHESS/run/configs/CHESS_IR_CG_UT.yaml) config file to [CHESS/run/configs/CHESS_IR_CG_UT_LLAMA3-2.yaml](CHESS/run/configs/CHESS_IR_CG_UT_LLAMA3-2.yaml) and the [CHESS/run/configs/CHESS_IR_SS_CG.yaml](CHESS/run/configs/CHESS_IR_SS_CG.yaml) config file to [CHESS/run/configs/CHESS_IR_SS_CG_LLAMA3-2.yaml](CHESS/run/configs/CHESS_IR_SS_CG_LLAMA3-2.yaml) and replaced every `engine` and `engine_name` config with the `meta-llama/llama3-2` model as configured above.
2. Copying the appropriate `.yaml` file in `CHESS/run/configs` and setting all `engine` and `engine_name` parameters to the model of choice will configure the agents to one of the two workflows in the original CHESS paper, just with the model of choice.
Similarly, the shell scripts to run the agents were copied for testing purposes by the authors of the replication, copying [CHESS/run/run_main_ir_cg_ut.sh](CHESS/run/run_main_ir_cg_ut.sh) to [CHESS/run/run_main_ig_cg_ut_llama3.2.sh](CHESS/run/run_main_ig_cg_ut_llama3.2.sh) and [CHESS/run/run_main_ir_ss_cg.sh](CHESS/run/run_main_ir_ss_cg.sh) to [CHESS/run/run_main_ir_ss_cg_llama3.2.sh](CHESS/run/run_main_ir_ss_cg_llama3.2.sh) in `CHESS/run`. The `config` variable was changed to the appropriate path of the agent configuration file:
In [CHESS/run/run_main_ig_cg_ut_llama3.2.sh](CHESS/run/run_main_ig_cg_ut_llama3.2.sh), the line:
3. Copying a run script in the directory `CHESS/run` and adjusting the `config` variable to the appropriate config file created in step 2 makes the framework runnable with a custom configuration.
In the information retriever agent (IR) there is another call to the `embed`-API that is not covered by the config in the previous steps. In the `retrieve_entity` tool, the in the file [CHESS/src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py](CHESS/src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py), the replication authors added the import `from langchain_ollama import OllamaEmbeddings` and the property `embedding_function` of class `RetrieveEntity` (line 34, `self.embedding_function = OpenAIEmbeddings(model="text-embedding-3-small")`) was adapted to the `OllamaEmbeddings`: `self.embedding_function = OllamaEmbeddings(model="llama3.2")`
4. Changing the model for the embedding calls of the `retrieve_entity` tool in the information retriever agent to the model of choice will ensure all API calls are directed to the appropriate LLMs.
5. Run the shell script for your configuration from the `CHESS` directory, e. g. for the Llama3.2 testing config of the replication authors: