Snippets Groups Projects

Forked from dbis-public / test-gpu-container

16 commits ahead of the upstream repository.

1 week ago
f07efd00

Changed README · f07efd00
Riko Corwin Uphoff authored 1 week ago

f07efd00

History

Changed README
Riko Corwin Uphoff authored 1 week ago

Code owners

Assign users and groups as approvers for specific file changes. Learn more.

README.md 2.62 KiB

Training Script Documentation

Overview

This script is designed to facilitate model training with various configurations. Users can specify multiple parameters, including the training mode, optimizer, model type, and other training settings.

Usage

python train.py --mode <mode> --optimizer <optimizer> --model <model> [other options]

Input Parameters

Required Parameters

Parameter	Type	Choices	Description
`--mode`	string	`pretraining`, `finetuning`	Specifies the training mode.
`--optimizer`	string	`lora`, `galore`, `galore8bit`, `lora+galore8bit`, `baseline`	Selects the optimizer type.
`--model`	string	`llama_60m`, `llama_1b`, `llama_7b`, `roberta`, `gpt2`	Defines the model to train.

Optional Parameters

Parameter	Type	Default	Choices	Description
`--batch_size`	int	`16`	N/A	Number of samples per batch.
`--num_epochs`	int	`30`	N/A	Number of training epochs.
`--max_length`	int	`512`	N/A	Maximum token length per input.
`--num_training_tokens`	int	`1e9`	N/A	Number of training tokens (only for pretraining).
`--shuffle`	string	`true`	`true`, `false`	Whether to shuffle training data (not applicable in streaming mode).
`--dtype`	string	`fp16`	`bf16`, `fp16`	Data type for training (currently only `bf16` is working).
`--lr`	float	`4e-4`	N/A	Learning rate for optimizer.
`--weight_decay`	float	`0.01`	N/A	Weight decay for optimizer.
`--tmax`	int	`30`	N/A	Tmax for scheduler.
`--lora_config`	string	`config/lora_config.json`	N/A	Path to the LoRa configuration file.
`--galore_config`	string	`config/galore_config.json`	N/A	Path to the GaLore configuration file.
`--test`	string	`false`	`true`, `false`	Whether to enable test mode. Takes only 1000 tokens of dataset for pretraining and accelerator without bf16 (useful only for A100 GPUs).

Example Command

python train.py --mode pretraining --optimizer lora --model llama_1b --batch_size 32 --num_epochs 20 --shuffle false --lr 3e-4

This command runs the script in pretraining mode using the LoRa optimizer on the llama_1b model with a batch size of 32, 20 epochs, no data shuffling, and a learning rate of 3e-4.