G's picture

1 7 8

G

spandanagella

·

AI & ML interests

None yet

Recent Activity

liked a model about 2 months ago

ServiceNow-AI/Apriel-H1-15b-Thinker-SFT

authored a paper about 2 months ago

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

authored a paper about 2 months ago

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

View all activity

Organizations

authored 6 papers about 2 months ago

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Paper • 2508.16763 • Published Aug 22, 2025 • 2

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

Paper • 2510.03230 • Published Oct 3, 2025 • 3

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Paper • 2508.09804 • Published Aug 13, 2025

DRBench: A Realistic Benchmark for Enterprise Deep Research

Paper • 2510.00172 • Published Sep 30, 2025 • 1

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 105

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval

Paper • 2511.00903 • Published Nov 2, 2025

authored 11 papers 7 months ago

Using In-Context Learning to Improve Dialogue Safety

Paper • 2302.00871 • Published Feb 2, 2023 • 1

Multimodal Abstractive Summarization for How2 Videos

Paper • 1906.07901 • Published Jun 19, 2019

TEACh: Task-driven Embodied Agents that Chat

Paper • 2110.00534 • Published Oct 1, 2021

DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Paper • 2212.10557 • Published Dec 20, 2022

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 13

SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19, 2025 • 2

FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering

Paper • 2412.07030 • Published Dec 9, 2024

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

Paper • 2505.16293 • Published May 22, 2025 • 2

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Paper • 2505.20793 • Published May 27, 2025 • 13

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Paper • 2503.21889 • Published Mar 27, 2025 • 2

authored a paper 11 months ago

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3, 2025 • 39