Phys

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

wchai authored a paper 14 days ago

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

wchai authored a paper 14 days ago

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

wchai authored a paper 14 days ago

Science-T2I: Addressing Scientific Illusions in Image Synthesis

View all activity

wchai

authored 7 papers 14 days ago

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Paper • 2404.04910 • Published Apr 7, 2024

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

Paper • 2503.04240 • Published Mar 6

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Paper • 2504.13129 • Published Apr 17 • 3

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark

Paper • 2504.14693 • Published Apr 20

EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments

Paper • 2503.08604 • Published Mar 11

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Paper • 2505.23606 • Published May 29 • 14

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published 17 days ago • 22

sainx

authored a paper 14 days ago

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published 17 days ago • 22

sainx

authored 2 papers about 2 months ago

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published May 15 • 9

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 94

wchai

authored a paper about 2 months ago

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Paper • 2505.01583 • Published May 2 • 9

sainx

authored a paper 2 months ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14 • 21

wchai

authored a paper 3 months ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 63

sainx

authored a paper 3 months ago

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 31

Jialuo21

published a dataset 4 months ago

Phys111111/combined_data

Viewer • Updated Oct 6, 2024 • 1.93k • 327

wchai

authored a paper 4 months ago

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published Feb 27 • 28

sainx

authored 2 papers 5 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 72

Fiaa

authored a paper 6 months ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15

sainx

authored a paper 6 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

AI & ML interests

Recent Activity

Team members 6

Phys111111's activity