Resolve "Rohdaten verlinken und Dokumentation in README.md"
Compare changes
- Erik Jonas Hartnick authored
+ 145
− 1
@@ -8,4 +8,148 @@ Paper: [Link](https://arxiv.org/abs/2405.16755)
\ No newline at end of file
To set up the `sampling_count` for the revise tool in the BIRD SDS, switch to branch [3-konfigurieren-fur-llama3-70b-und-slrum](https://gitlab.informatik.uni-halle.de/aktxt/re-chess/-/tree/3-konfigurieren-fur-llama3-70b-und-slrum), then edit the [CHESS/run/configs/CHESS_IR_SS_CG_BIRD_OSS.yaml](https://gitlab.informatik.uni-halle.de/aktxt/re-chess/-/blob/3-konfigurieren-fur-llama3-70b-und-slrum/CHESS/run/configs/CHESS_IR_SS_CG_BIRD_OSS.yaml?ref_type=heads#L63) file in line 63 (at the end of the file, under `revise` > `sampling_count`). Set it to the desired value:
1. During preprocessing: In the `CHESS/src/database_utils/db_catalog/preprocess.py` file, we used `mxbai-embed-large` but also tried `Llama3-70B` for embedding, the latter is commented out. To switch between the models, simply comment in the one you would like to use and comment out the others (lines 31-35). Note that Ollama needs to have the models downloaded separately before they can be used:
2. In retrieve_entity: In the `CHESS/src/workflow/agents/information_retriever/tool_kit/retrieve_entity.py` file, we used `nomic-embed-text` but also tried `Llama3-70B` for embedding, the latter is commented out. To switch between the models, simply comment in the one you would like to use and comment out the others (lines 35-38). Note that Ollama needs to have the models downloaded separately before they can be used:
The [3-konfigurieren-fur-llama3-70b-und-slrum](https://gitlab.informatik.uni-halle.de/aktxt/re-chess/-/tree/3-konfigurieren-fur-llama3-70b-und-slrum) branch is set up for the subsampled BIRD dev set. To use the full BIRD dev set with our slurm script, set `DATA_PATH` to `"./data/BIRD/dev/dev.json"`:
The [5-konfiguration-an-spider-datensatz-anpassen](https://gitlab.informatik.uni-halle.de/aktxt/re-chess/-/tree/5-konfiguration-an-spider-datensatz-anpassen) branch is set up for the full Spider test set. To use the subsampled Spider test set with our slurm script, set `DATA_PATH` to `"./data/Spider/spider_data/sub_sampled_spider_test_set.json"`: