* initial commit
* udpate ACKNOWLEDGMENTS.md
* adding olmoe to training
* clean up
* faster generation
* remove sanitize method
* more clean ups
* adding SwitchGLU
* clean up
* a little faster and adding norm_topk_prob
* formated
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Adds EXAONE architecture.
* nits + format
* format
* clean up and fix rope
* clean up and fix rope
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* feat: QDoRA with tests and a small bug fix for recalculation of self.m
* some simplifications and fixes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* LoRA: support fine-tuning tools datasets
* LoRA: Split small function
* LoRA: add tools format to lora docs
* LoRA: pre-commit fix
* Revert "LoRA: pre-commit fix"
This reverts commit b94b7e0fe7.
* Revert "LoRA: Split small function"
This reverts commit 3f6a5f19fd.
* LoRA: remove ToolsDataset
In a JSONL file, not all data is required to include the tools value.
* nit in readme
* nit in readme
* nit in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* initial commit
* initial commit
* Adding first lines
* adding x, and dt projection layers
* adding the clamping mechanism
* First succesful inference
* last commit for today - added custom geenrate function and it works as expected, will try training and then with loading a model from the hub
* clean up
* save up
* almost
* update
* update
* fixed cache handeling
* fixed loading
* added seperate generat_step method in the model and also in the utils to automaticaly use the generate step mthod in the model class
* quick update
* still not working
* save
* still not working
* initial commit
* utils.py logits = logits[:, -1, :] TypeError: tuple indices must be integers or slices, not tuple
* update
* update
* Fixing the Batching Depfwise Comnvolution and multi token input
* fixing generate and logits outputs
* Done!
* Fixing the cache handling, generating works now trying training
* update ACKNOWLEDGEMENTS
* removing the model_type if stuff in the _step loop in generate_step and adding MambaCache in base.py for training easier generations and removing mamba in tuner/utils.
* quick clean up
* update trainer/utils for right initialisation of the layers for LoRA, but not working.
* clean up
* Forther update to trainer/utils for correct layer selection. Successfull training
* removing extra mamba-infer.py file
* clean up, reformating will come later
* reformat and big clean up, final commit
* some speedups and cleanups
* fix test
* nits
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* feat: Nemotron
https://huggingface.co/nvidia/Minitron-4B-Base
This is basically Llama with partial RoPE and LayerNorm instead of
BatchNorm. Also they add 1 to the LayerNorm weight for some reason.
* fixup! feat: Nemotron
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>