[FEEDBACK] Daily Papers

#32
by kramp HF staff - opened
Hugging Face org
edited Jul 25, 2024

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

  • Submitting is available to paper authors
  • Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co/papers/submit

  • Add medias to the paper (images, videos) when relevant
  • You can start the discussion to engage with the community

Please check out the documentation

We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".

Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset

This comment has been hidden
Hugging Face org

@Yiwen-ntu for now we support only videos as paper covers in the Daily.

This comment has been hidden
This comment has been hidden

we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644

How about searching papers from any search bar ? Today, we have to navigate to Daily papers to be able to find one by arxiv code. Often, I forget that and, fail by first trying on the main search bar (often from homepage).

Paper title: Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Link: https://arxiv.org/pdf/2502.11357

Dear AK and HF Team,

We are excited to share our recent work on comprehensive finance text embedding. We also develop a SoTA LLM-based embedding model for the finance domain. 🤗

Title: FinMTEB: Finance Massive Text Embedding Benchmark
Link: https://arxiv.org/abs/2502.10990
Github: https://github.com/yixuantt/FinMTEB
Leaderboard: https://huggingface.co/spaces/FinanceMTEB/FinMTEB
Model: yixuantt/Fin-e5

Dear AK and HF Team,

We are thrilled to present our recent research, which investigates and benchmarks various inference-time computation strategies to enhance reasoning performance in large language models (LLMs). With the growing interest in solving complex reasoning tasks, methods such as Best-of-N and beam search have shown promise in improving reasoning capabilities without requiring modifications to model parameters or additional training. However, challenges remain in their implementation, with many existing approaches still in the proof-of-concept stage, hindered by computational complexity and task-specific limitations.

In this work, we focus on optimizing both the candidate solution generation and the reward mechanisms that underpin these inference-time strategies. By exploring the impact of different prompting techniques, hyperparameters like temperature and top-p, and reward types such as self-evaluation and RLHF rewards, we uncover previously overlooked strategies that significantly enhance reasoning performance. Our extensive experiments—spanning over 20,000 A100-80G GPU hours and 1,000+ experiments—cover various models from the Llama, Qwen, and Mistral families. These findings demonstrate that careful tuning of hyperparameters like temperature can lead to performance gains of up to 5% in reasoning tasks.

Furthermore, we establish a standardized benchmark for evaluating inference-time computation techniques, assessing six representative methods across eight different reasoning tasks. Our work provides a robust foundation for advancing future research in this area, setting the stage for more practical and scalable applications of LLM-based reasoning systems.

Title: Bag of Tricks for Inference-time Computation of LLM Reasoning

Link: https://arxiv.org/abs/2502.07191

Github: https://github.com/usail-hkust/benchmark_inference_time_computation_LLM

Dear AK and HF Team,

We are excited to share our work on Text-to-SQL. The information for the paper we submitted is as follows:

Title: SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
Link: https://arxiv.org/abs/2502.11741
Github: https://github.com/ShuaiLyu0110/SQL-o1

Dear AK and HF Team,

Buckle up for a wild ride into the world of large language models! 🚀 Ever wished you could fine-tune massive LLMs without needing a full-blown data center? Well, dream no more! Our new approach, LoRAM, is here to train small and infer large—bringing you memory-efficient LoRA training without sacrificing performance.

Imagine turning a 70-billion-parameter beast into a nimble, memory-efficient marvel—like transforming an elephant into a sleek race car! 🐘➡️🏎️ We take the classic LoRA method, give it a trendy haircut by pruning away those underutilized neurons 💇‍♂️, and then recover the pruned low-rank matrices to supercharge the full model during inference.

The Challenge 🤯

While LoRA offers a cost-effective fine-tuning solution, the memory footprint remains dominated by the original model parameters. Training a 70B model traditionally demands an A100-80G GPU or even a fleet of 15 GPUs. Yikes!

The LoRAM Magic 🪄

LoRAM turns this challenge on its head by:

  • Tiny Yet Mighty: Training on a pruned (small) model with just 20G HBM—no need for heavyweight GPUs! 🎉
  • Wallet-Friendly Wizardry: Using structured pruning combined with 4-bit quantization (QLoRAM) slashes storage costs by up to 16.95×, proving that efficiency and performance can indeed dance together! 💃💸
  • Seamless Sync: Minimal-cost continual pre-training aligns the knowledge between the pruned and original models, ensuring no magic is lost in translation. 🔗✨

The Results 🤯🚀

With LoRAM, we not only achieve dominant performance gains over both the original 70B model and smaller LoRA-trained models but also make massive model training accessible—running on a single 20G GPU!

Curious to see the magic in action? Check out our paper and code:

We can’t wait for you to join us on this exhilarating journey where smart engineering meets a splash of neural magic! 😄🌟

Cheers,
The LoRAM Team

Dear AK and HF team,

We are excited to share our new paper on estimating hallucination rates of 11 large multilingual language models across 30 languages.
The paper comes with 2 datasets that are open source and ready to be used by the community. Below is the figure showing hallucination rates across 11 LLMs for 30 languages.

image.png

Summary of our findings:

  1. Within LLM family, smaller LLM hallucinate more than large variant.

image.png

  1. Increasing number of supported languages correlate significantly with increasing number of hallucinations.

image.png

  1. Smaller digital representation of a language does not necessarily mean higher hallucination rates.

Resources:
The paper releases two datasets: for 30 languages.

  1. Multilingual Hallucination Detection: https://huggingface.co/datasets/WueNLP/mHallucination_Detection
  2. Multilingual Hallucination Evaluation: https://huggingface.co/datasets/WueNLP/mHallucination_Evaluation

Paper, Dataset, and Code:

  1. Archive Paper: How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
  2. Huggingface collection: https://huggingface.co/collections/WueNLP/mhallucinations-llm-67b5aedb0e7fed1190e148d8
  3. Github: https://github.com/WorldHellow/mHallucinations-LLM

Hopefully the community would enjoy reading and utilizing our work.

Cheers

Sign up or log in to comment