KimRina's picture

15 22

KimRina

KimRina

·

KimRina-ai

AI & ML interests

Multimodal (CV & LLM)

Recent Activity

liked a dataset 8 days ago

yale-nlp/MMVU

liked a dataset 8 days ago

huggingface/policy-docs

reacted to chansung's post with 👍 8 days ago

Simple summarization of Evolving Deeper LLM Thinking (Google DeepMind) The process starts by posing a question. 1) The LLM generates initial responses. 2) These generated responses are evaluated according to specific criteria (program-based checker). 3) The LLM critiques the evaluated results. 4) The LLM refines the responses based on the evaluation, critique, and original responses. The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged). Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios. However, there are two major drawbacks: 🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.) 🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.) https://arxiv.org/abs/2501.09891

View all activity

Organizations

KimRina's activity

upvoted a paper 9 days ago

GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published 16 days ago • 61

upvoted 2 papers about 1 month ago

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Paper • 2412.09645 • Published Dec 10, 2024 • 35

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Paper • 2412.12606 • Published Dec 17, 2024 • 41

upvoted 3 papers about 2 months ago

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 90

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Paper • 2412.04455 • Published Dec 5, 2024 • 37

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

Paper • 2411.17176 • Published Nov 26, 2024 • 23

upvoted 2 papers 2 months ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 79

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 73

upvoted 6 papers 3 months ago

SAMPart3D: Segment Any Part in 3D Objects

Paper • 2411.07184 • Published Nov 11, 2024 • 26

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published Nov 7, 2024 • 17

Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29, 2024 • 31

Survey of Cultural Awareness in Language Models: Text and Beyond

Paper • 2411.00860 • Published Oct 30, 2024 • 23

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 66

CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published Oct 23, 2024 • 200

upvoted a collection 8 months ago

Ko-BioMistral-7B

A Korean Language Model for Biomedical Text • 3 items • Updated Jun 2, 2024 • 1