NG's picture

125 213

NG

SirRa1zel

·

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

liked a model 2 days ago

deepseek-ai/Janus-Pro-1B

upvoted a collection 2 days ago

liked a model 5 days ago

HKUSTAudio/Llasa-3B

View all activity

Organizations

None yet

SirRa1zel's activity

upvoted a collection 2 days ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 3 items • Updated 3 days ago • 277

upvoted a paper 6 days ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published 8 days ago • 62

upvoted a paper 7 days ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 9 days ago • 79

upvoted a paper 10 days ago

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Paper • 2501.10045 • Published 13 days ago • 8

upvoted a paper 12 days ago

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published 14 days ago • 19

upvoted a collection 14 days ago

OuteTTS 0.3

4 items • Updated 15 days ago • 18

upvoted a paper 14 days ago

MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published 16 days ago • 55

upvoted a collection 14 days ago

Visual Document Retrieval

A collection of models, datasets, and spaces in the VDR series • 5 items • Updated 20 days ago • 8

upvoted a paper 14 days ago

UnCommon Objects in 3D

Paper • 2501.07574 • Published 17 days ago • 13

upvoted a paper 17 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 20 days ago • 59

upvoted a paper 18 days ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published 23 days ago • 42

upvoted a collection 23 days ago

Cosmos

The collection of Cosmos models • 31 items • Updated 13 days ago • 251

upvoted a paper about 2 months ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45

upvoted a collection about 2 months ago

[MASK] is All You Need

Code, dataset, and pretrained model • 5 items • Updated Nov 29, 2024 • 9

upvoted a paper about 2 months ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 35

upvoted a paper 3 months ago

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Paper • 2407.03648 • Published Jul 4, 2024 • 18

upvoted 2 collections 3 months ago

MelodyFlow

MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching • 7 items • Updated Oct 23, 2024 • 16

LayerSkip

Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated Nov 21, 2024 • 47

upvoted 2 papers 4 months ago

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 27

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 41