DistAya Community

community

AI & ML interests

Knowledge Distillation, Pruning, Quantization, KV Cache Compression, Latency, Inference Speed

Organization Card

Multilingual language models have many deployment challenges.

Can we engineer multilingual language models that not only match the prowess of their bulkier counterparts but do so while being more compact, quicker on their feet, and capable of handling massive data batches in real-time production environments. Is this a feat we can achieve?

Techniques:

Pruning
- Unstructured Pruning
- Structured Pruning
- Semi-Structured Pruning
- Methods Used
  - SparseGPT | GitHub
  - ShortGPT | KLDBasedPruning & Perplexity Sensivities
Knowledge Distillation
- Hidden State-Based Distillation ~ DistillKit | GitHub
- Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
- Minitron: Compact Language models via Pruning & Knowledge Distillation
- DistiLLM: Towards Streamlined Distillation for Large Language Models
Quantization
- Quantization Aware Training (QAT)
- Post Training Quantization (PTQ)
  - KV Cache Quantization
  - Weight & Activation Quantization
Low-Rank Factorization
Fine-Tuning | GitHub

Datasets:

Initial 7 datasets unified, having 6.62M rows which includes the following:

Bangla_Alpaca_Orca : Bangle
Urdu_Instruct_News_Article_Generation: Urdu
Urdu_Instruct_News_Headline_Generation: Urdu
Urdu_Instruct_News_Category_Classification: Urdu
cidar: Arabic
Six_Millions_Instruction_Dataset_For_Arabic_Llm_Ft: Arabic
instructv3: English

Get in touch with the team:

Mayank Bhaskar -> [email protected]
Ahmad Anis -> [email protected]
Drishti Sharma -> [email protected]
Vishnu Vardhan -> [email protected]
Yaya -> [email protected]
Shayekh Bin Islam -> [email protected]

Collections 6

View 6 collections

models 0

None public yet

datasets 0

None public yet

DistAya Community

AI & ML interests

Techniques:

Datasets:

Get in touch with the team:

Collections 6

DrishtiSharma/dense-baseline

DrishtiSharma/aya-c4-sparsity-0.5-gmp

DrishtiSharma/aya_sparsity_0.5_wbits_4

DrishtiSharma/aya_sparsity_prunen_2_prunem_4

shayekh/aya8b-distillkit-hidden

shayekh/aya8b-distillkit-logits

AhmadMustafa/distAyaQwen

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

DrishtiSharma/dense-baseline

DrishtiSharma/aya-c4-sparsity-0.5-gmp

DrishtiSharma/aya_sparsity_0.5_wbits_4

DrishtiSharma/aya_sparsity_prunen_2_prunem_4

shayekh/aya8b-distillkit-hidden

shayekh/aya8b-distillkit-logits

AhmadMustafa/distAyaQwen

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

models 0

datasets 0

AI & ML interests

Team members 7

Techniques:

Datasets:

Get in touch with the team:

Collections 6

models 0

datasets 0