Question and Fine tuning script
Hi NickyNicky,
Thanks for sharing this !
I have a question:
if you fine-tune the
togethercomputer-LLaMA-2-7B-32K base model on a dataset with short contexts (length of input short), would you then expect it to perform well when given longer inputs ?
(I am assuming you trained it with a dataset in which the length of input was less than 3000 tokens. )
Thanks in advance for your reply.
P.S: also, would it be possible to share the script you use ( I guess QLoRa)
I tried it one one RTX 4090 (24GB) but get out of memory errors even using batch_size 1 .
I then shortened to 6k. It is still training but I can see already a very strange loss curve :-(
Update:
it did not work.
When I tried it on inputs which are longer than the maximum tokens I trained it on, it gives non sensical replies.
I guess one would need a bigger GPU to exhaust the full length (or at least get something like 16k).
How many RAM did you used ?
I saw in the threads below that the gentleman needed 2 A600s (2 x 48GB) for the xgen_7B_8k to finetune it with (QLoRa)
https://www.reddit.com/r/LocalLLaMA/comments/1546kiv/xgen_7b_8k_context_finetuned_on_guanaco/
credits to:
- https://www.philschmid.de/instruction-tune-llama-2
- https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/instruction-tune-llama-2-int4.ipynb
the model togethercomputer-LLaMA-2-7B-32K-open-Orca-v1 and togethercomputer-LLaMA-2-7B-32K-open-Orca-v2 train with QLora, peft and flash-attention for a period of 4 hours V1 and 5 hours v2, 1 GPU A100 (Google colab).
I really wanted to train him longer but it's out of budget.
values to train:
per_device_train_batch_size=14
trust_remote_code=False
After training and joining the weights you can enable flash attention.
Thank you !
Great resources.
I will try your model to see how it behave when given a long input (I see that philschmid's script uses max_seq_length = 2048)
yes , more tokens ->> more time train.