# Training Script Documentation ## Overview This script is designed to facilitate model training with various configurations. Users can specify multiple parameters, including the training mode, optimizer, model type, and other training settings. ## Usage ```bash python train.py --mode <mode> --optimizer <optimizer> --model <model> [other options] ``` ## Input Parameters ### Required Parameters | Parameter | Type | Choices | Description | |-------------|--------|------------------------------------------|------------------------------------| | `--mode` | string | `pretraining`, `finetuning` | Specifies the training mode. | | `--optimizer` | string | `lora`, `galore`, `galore8bit`, `lora+galore8bit`, `baseline` | Selects the optimizer type. | | `--model` | string | `llama_60m`, `llama_1b`, `llama_7b`, `roberta`, `gpt2` | Defines the model to train. | ### Optional Parameters | Parameter | Type | Default | Choices | Description | |-----------------|------|----------|---------|-------------| | `--batch_size` | int | `16` | N/A | Number of samples per batch. | | `--num_epochs` | int | `30` | N/A | Number of training epochs. | | `--max_length` | int | `512` | N/A | Maximum token length per input. | | `--num_training_tokens` | int | `1e9` | N/A | Number of training tokens (only for pretraining). | | `--shuffle` | string | `true` | `true`, `false` | Whether to shuffle training data (not applicable in streaming mode). | | `--dtype` | string | `fp16` | `bf16`, `fp16` | Data type for training (currently only `bf16` is working). | | `--lr` | float | `4e-4` | N/A | Learning rate for optimizer. | | `--weight_decay` | float | `0.01` | N/A | Weight decay for optimizer. | | `--tmax` | int | `30` | N/A | Tmax for scheduler. | | `--lora_config` | string | `config/lora_config.json` | N/A | Path to the LoRa configuration file. | | `--galore_config` | string | `config/galore_config.json` | N/A | Path to the GaLore configuration file. | | `--test` | string | `false` | `true`, `false` | Whether to enable test mode. Takes only 1000 tokens of dataset for pretraining and accelerator without bf16 (useful only for A100 GPUs). | ## Example Command ```bash python train.py --mode pretraining --optimizer lora --model llama_1b --batch_size 32 --num_epochs 20 --shuffle false --lr 3e-4 ``` This command runs the script in pretraining mode using the LoRa optimizer on the `llama_1b` model with a batch size of 32, 20 epochs, no data shuffling, and a learning rate of `3e-4`.