Forked from
dbis-public / test-gpu-container
16 commits ahead of the upstream repository.
-
Riko Corwin Uphoff authoredRiko Corwin Uphoff authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
README.md 2.62 KiB
Training Script Documentation
Overview
This script is designed to facilitate model training with various configurations. Users can specify multiple parameters, including the training mode, optimizer, model type, and other training settings.
Usage
python train.py --mode <mode> --optimizer <optimizer> --model <model> [other options]
Input Parameters
Required Parameters
Parameter | Type | Choices | Description |
---|---|---|---|
--mode |
string |
pretraining , finetuning
|
Specifies the training mode. |
--optimizer |
string |
lora , galore , galore8bit , lora+galore8bit , baseline
|
Selects the optimizer type. |
--model |
string |
llama_60m , llama_1b , llama_7b , roberta , gpt2
|
Defines the model to train. |
Optional Parameters
Parameter | Type | Default | Choices | Description |
---|---|---|---|---|
--batch_size |
int | 16 |
N/A | Number of samples per batch. |
--num_epochs |
int | 30 |
N/A | Number of training epochs. |
--max_length |
int | 512 |
N/A | Maximum token length per input. |
--num_training_tokens |
int | 1e9 |
N/A | Number of training tokens (only for pretraining). |
--shuffle |
string | true |
true , false
|
Whether to shuffle training data (not applicable in streaming mode). |
--dtype |
string | fp16 |
bf16 , fp16
|
Data type for training (currently only bf16 is working). |
--lr |
float | 4e-4 |
N/A | Learning rate for optimizer. |
--weight_decay |
float | 0.01 |
N/A | Weight decay for optimizer. |
--tmax |
int | 30 |
N/A | Tmax for scheduler. |
--lora_config |
string | config/lora_config.json |
N/A | Path to the LoRa configuration file. |
--galore_config |
string | config/galore_config.json |
N/A | Path to the GaLore configuration file. |
--test |
string | false |
true , false
|
Whether to enable test mode. Takes only 1000 tokens of dataset for pretraining and accelerator without bf16 (useful only for A100 GPUs). |
Example Command
python train.py --mode pretraining --optimizer lora --model llama_1b --batch_size 32 --num_epochs 20 --shuffle false --lr 3e-4
This command runs the script in pretraining mode using the LoRa optimizer on the llama_1b
model with a batch size of 32, 20 epochs, no data shuffling, and a learning rate of 3e-4
.