^MMap: 0%| | 0/4358 [00:00<?, ? examples/s]^MMap: 23%|██▎ | 1000/4358 [00:00<00:02, 1>^MMap: 0%| | 0/1801350 [00:00<?, ? examples/s]^MMap: 0%| | 1000/1801350 [00:00<22>^MMap: 0%| | 0/3760 [00:00<?, ? examples/s]^MMap: 27%|██▋ | 1000/3760 [00:00<00:01, 1>You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move th>Teste GPT-2 Small mit standard Attention...