Pramudito

Ditot

AI & ML interests

None yet

Recent Activity

reacted to tomaarsen's post with 🔥 5 days ago

‼️Sentence Transformers v4.0 is out! You can now train and finetune reranker models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also prove that finetuning on your domain helps much more than you might think. 1️⃣ Reranker Training Refactor Reranker models can now be trained using an extensive trainer with a lot of powerful features: - MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP)) - bf16 training support; loss logging - Evaluation datasets + evaluation loss - Improved callback support + an excellent Weights & Biases integration - Gradient checkpointing, gradient accumulation - Model card generation - Resuming from a training checkpoint without performance loss - Hyperparameter Optimization and much more! Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co/blog/train-reranker Notably, the release is fully backwards compatible: all deprecations are soft, meaning that they still work but emit a warning informing you how to upgrade. 2️⃣ New Reranker Losses - 11 new losses: - 2 traditional losses: BinaryCrossEntropy and CrossEntropy - 2 distillation losses: MSE and MarginMSE - 2 in-batch negatives losses: MNRL (a.k.a. InfoNCE) and CMNRL - 5 learning to rank losses: Lambda, p-ListMLE, ListNet, RankNet, ListMLE 3️⃣ New Reranker Documentation - New Training Overview, Loss Overview, API Reference docs - 5 new, 1 refactored training examples docs pages - 13 new, 6 refactored training scripts - Migration guides (2.x -> 3.x, 3.x -> 4.x) 4️⃣ Blogpost Alongside the release, I've written a blogpost where I finetune ModernBERT on a generic question-answer dataset. My finetunes easily outperform all general-purpose reranker models, even models 4x as big. Finetuning on your domain is definitely worth it: https://huggingface.co/blog/train-reranker See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v4.0.1

replied to AdinaY's post 5 days ago

Let's check out the latest releases from the Chinese community in March! 👉 https://huggingface.co/collections/zh-ai-community/march-2025-releases-from-the-chinese-community-67c6b479ebb87abbdf8e2e76 ✨MLLM > R1 Omni by Alibaba Tongyi - 0.5B > Qwen2.5 Omni by Alibaba Qwen - 7B with apache2.0 🖼️Video > CogView-4 by ZhipuAI - Apacha2.0 > HunyuanVideo-I2V by TencentHunyuan > Open Sora2.0 - 11B with Apache2.0 > Stepvideo TI2V by StepFun AI - 30B with MIT license 🎵Audio > DiffDiffRhythm - Apache2.0 > Spark TTS by SparkAudio - 0.5B ⚡️Image/3D > Hunyuan3D 2mv/2mini (0.6B) by @TencentHunyuan > FlexWorld by ByteDance - MIT license > Qwen2.5-VL-32B-Instruct by Alibaba Qwen - Apache2.0 > Tripo SG (1.5B)/SF by VastAIResearch - MIT license > InfiniteYou by ByteDance > LHM by Alibaba AIGC team - Apache2.0 > Spatial LM by ManyCore 🧠Reasoning > QwQ-32B by Alibaba Qwen - Apache2.0 > Skywork R1V - 38B with MIT license > RWKV G1 by RWKV AI - 0.1B pure RNN reasoning model with Apache2.0 > Fin R1 by SUFE AIFLM Lab - financial reasoning 🔠LLM > DeepSeek v3 0324 by DeepSeek -MIT license > Babel by Alibaba DAMO - 9B/83B/25 languages

reacted to Yehor's post with 😎 5 days ago

Are you interesting in different runtimes for AI models? Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms. I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

View all activity

Organizations

None yet

Ditot's activity

reacted to tomaarsen's post with 🔥 5 days ago

Post

2172

‼️Sentence Transformers v4.0 is out! You can now train and finetune reranker models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also prove that finetuning on your domain helps much more than you might think.

1️⃣ Reranker Training Refactor
Reranker models can now be trained using an extensive trainer with a lot of powerful features:
- MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
- bf16 training support; loss logging
- Evaluation datasets + evaluation loss
- Improved callback support + an excellent Weights & Biases integration
- Gradient checkpointing, gradient accumulation
- Model card generation
- Resuming from a training checkpoint without performance loss
- Hyperparameter Optimization
and much more!

Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co/blog/train-reranker
Notably, the release is fully backwards compatible: all deprecations are soft, meaning that they still work but emit a warning informing you how to upgrade.

2️⃣ New Reranker Losses
- 11 new losses:
- 2 traditional losses: BinaryCrossEntropy and CrossEntropy
- 2 distillation losses: MSE and MarginMSE
- 2 in-batch negatives losses: MNRL (a.k.a. InfoNCE) and CMNRL
- 5 learning to rank losses: Lambda, p-ListMLE, ListNet, RankNet, ListMLE

3️⃣ New Reranker Documentation
- New Training Overview, Loss Overview, API Reference docs
- 5 new, 1 refactored training examples docs pages
- 13 new, 6 refactored training scripts
- Migration guides (2.x -> 3.x, 3.x -> 4.x)

4️⃣ Blogpost
Alongside the release, I've written a blogpost where I finetune ModernBERT on a generic question-answer dataset. My finetunes easily outperform all general-purpose reranker models, even models 4x as big. Finetuning on your domain is definitely worth it: https://huggingface.co/blog/train-reranker

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v4.0.1

replied to AdinaY's post 5 days ago

Im going inn!

reacted to Yehor's post with 😎 5 days ago

Post

2008

Are you interesting in different runtimes for AI models?

Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.

I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

replied to AdinaY's post 5 days ago

Nice and thankyouu

reacted to AdinaY's post with 🔥 5 days ago

Post

1813

AReal-Boba 🔥 a fully open RL Frameworks released by AntGroup, an affiliate company of Alibaba.
inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a
✨ 7B/32B - Apache2.0
✨ Outperform on math reasoning
✨ Replicating QwQ-32B with 200 data under $200
✨ All-in-one: weights, datasets, code & tech report

1 reply