Pala Tej Deep

Tej3

Tej-Deep

AI & ML interests

None yet

Recent Activity

upvoted a collection about 17 hours ago

Qwen3

authored a paper 3 months ago

Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

upvoted a paper 3 months ago

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

View all activity

Organizations

Tej3's activity

upvoted a collection about 17 hours ago

Qwen3

Collection

23 items • Updated 1 day ago • 430

authored a paper 3 months ago

Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

Paper • 2408.10701 • Published Aug 20, 2024 • 12

upvoted a paper 3 months ago

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3 • 14

upvoted 2 papers 4 months ago

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Paper • 2412.21037 • Published Dec 30, 2024 • 24

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Paper • 2412.11974 • Published Dec 16, 2024 • 9

liked a model 4 months ago

declare-lab/Emma-X

Image-Text-to-Text • Updated Jan 27 • 52 • 9

upvoted a paper 6 months ago

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published Nov 9, 2024 • 46

upvoted a paper 9 months ago

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7, 2024 • 18

replied to RishabhBhardwaj's post 10 months ago

The backbone refers to the pretrained model used as the base model for fine-tuning the expert model.

For example, in the case of Wizard Models:

WizardLM-13B and WizardMath-13B are both fine-tuned from the llama2-13B model. Therefore, they can be effectively merged using Della, Dare, or TIES because they share the same backbone.
On the other hand, WizardCoder-13B is fine-tuned from the CodeLlama-13B-Python model. Since WizardCoder uses a different base model (backbone) compared to WizardLM-13B and WizardMath-13B, merging these three models effectively using Della, Dare, or TIES is not feasible.

To summarize, the backbone is the underlying pretrained model that serves as the starting point for fine-tuning. It is crucial in the merging process because models fine-tuned from different backbones may not merge effectively due to the differences in their initial pretrained weights and configurations.

upvoted a paper 10 months ago