llm-scratch (LLM from Scratch)

ariG23498

posted an update about 1 month ago

Post

643

I have always advocated for writing techinical stories without using LLMs.

The following one page editorial really drives the point home.
https://www.nature.com/articles/s44222-025-00323-4

ariG23498

posted an update 3 months ago

Post

1703

🚨 Implement KV Cache from scratch in pure PyTorch. 🚨

We have documented all of our learning while implementing KV Cache to nanoVLM. Joint work with @kashif @lusxvr @andito @pcuenq

Blog: hf.co/blog/kv-cache

1 reply

·

ayut

updated a dataset 6 months ago

llm-scratch/wmt14-de-en-split

Viewer • Updated Feb 28 • 4.51M • 11

ariG23498

updated a dataset 6 months ago

llm-scratch/wmt14-de-en-split

Viewer • Updated Feb 28 • 4.51M • 11

ariG23498

published a dataset 6 months ago

llm-scratch/wmt14-de-en-split

Viewer • Updated Feb 28 • 4.51M • 11

ariG23498

posted an update 7 months ago

Post

2845

Tried my hand at simplifying the derivations of Direct Preference Optimization.

I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.

Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo

ariG23498

posted an update 7 months ago

Post

2054

Timm ❤️ Transformers

Wtih the latest version of transformers you can now use any timm model with the familiar transformers API.

Blog Post: https://huggingface.co/blog/timm-transformers
Repository with examples: https://github.com/ariG23498/timm-wrapper-examples
Collection: ariG23498/timmwrapper-6777b85f1e8d085d3f1374a1

ariG23498

updated a Space 8 months ago

README

🚀

Understanding LLMs from scratch

ariG23498

posted an update 9 months ago

Post

1463

We are blessed with another iteration of Pali Gemma. Google launches PaliGemma 2.

google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

merve/paligemma2-vqav2

ariG23498

posted an update 9 months ago

Post

2985

Qwen/qwen25-66e81a666513e518adb90d9e

Qwen/Qwen2.5-Coder-Artifacts

Qwen/Qwen2.5-Coder-demo

ariG23498

posted an update 10 months ago

Post

1614

Cohere drops two new multilingual models!

https://huggingface.co/CohereForAI/aya-expanse-8b
https://huggingface.co/CohereForAI/aya-expanse-32b

Try them out here

https://huggingface.co/spaces/CohereForAI/aya_expanse

ariG23498

posted an update 12 months ago

Post

1644

You can now use DoRA for your embedding layers!

PR: https://github.com/huggingface/peft/pull/2006

I have documented my journey of this specific PR in a blog post for everyone to read. The highlight of the PR was when the first author of DoRA reviewed my code.

Blog Post: https://huggingface.co/blog/ariG23498/peft-dora

Huge thanks to @BenjaminB for all the help I needed.

ariG23498

authored a paper almost 2 years ago

G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

Paper • 2009.12007 • Published Sep 25, 2020

AI & ML interests

Team members 2

llm-scratch's activity

README