Application
- Paper • 2309.05519 • Published • 78
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 20AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 16Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 17UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
Paper • 2305.17144 • Published • 2Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 10UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 2Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 14ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 16Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 15TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Paper • 2310.19019 • Published • 9ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 8LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 57Controllable Music Production with Diffusion Models and Guidance Gradients
Paper • 2311.00613 • Published • 24MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
Paper • 2108.02625 • Published • 1RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper • 2311.01455 • Published • 28FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 16PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Paper • 2311.01767 • Published • 18Fast View Synthesis of Casual Videos
Paper • 2312.02135 • Published • 8Generative Powers of Ten
Paper • 2312.02149 • Published • 4StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47Proactive Detection of Voice Cloning with Localized Watermarking
Paper • 2401.17264 • Published • 17PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Paper • 2402.01118 • Published • 29K-Level Reasoning with Large Language Models
Paper • 2402.01521 • Published • 17Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Paper • 2402.01831 • Published • 13TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 89Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 26Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Paper • 2402.10009 • Published • 18Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 10Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Paper • 2402.11450 • Published • 20AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • PublishedPersonalized Audiobook Recommendations at Spotify Through Graph Neural Networks
Paper • 2403.05185 • Published • 20AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Paper • 2403.17694 • Published • 10m-a-p/ChatMusician
Text Generation • Updated • 262 • 116Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 15
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 23Note GUI Odyssey為跨應用導航代理的研究提供了全面的數據資源,並通過開發OdysseyAgent展示了在跨應用導航任務中的顯著性能提升。這一工作不僅填補了現有數據集的空白,還為未來的移動設備自動導航技術提供了有力支持。
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9Note VideoGUI提供了一個新的多模態基準,專注於評估視覺為中心的GUI任務。研究表明,即使是最先進的模型也在這些任務上存在挑戰,特別是在高層次規劃方面,突顯了進一步研究和改進的必要性。
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis
Paper • 2406.08920 • Published • 7Note 本文提出了一種新穎的AV-GS模型,通過學習整體場景幾何和材料信息來提升新視角聲音合成的質量。實驗結果證明,該模型在合成和真實世界數據集上的性能優於現有方法,為未來的聲音合成研究提供了新的方向。
Training-free Camera Control for Video Generation
Paper • 2406.10126 • Published • 12Note 該研究提出了一種無需訓練的相機運動控制方法,能夠輕鬆應用於現有的視頻擴散模型,且在多項實驗中證明了其有效性和魯棒性。該方法為視頻生成中的相機運動控制提供了一個簡便且高效的解決方案。
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
Paper • 2406.10111 • Published • 6Note 本文提出的GaussianSR方法通過引入2D生成先驗,並通過減少隨機性干擾來優化3DGS,成功實現了高品質的HRNVS,顯著超越了現有的最先進方法。這項研究為高解析度視角合成提供了一個新思路,具有重要的應用價值。
MaskLID: Code-Switching Language Identification through Iterative Masking
Paper • 2406.06263 • Published • 5Note MaskLID方法通過屏蔽主要語言特徵,有效改善了CS場景下的語言識別,特別在多語言混合的句子中表現出色。該方法不僅提高了識別精度,且適用範圍廣泛,能處理大量網絡數據,對未來的自然語言處理應用有重要意義。
Depth Anything V2
Paper • 2406.09414 • Published • 92Note Depth Anything V2結合了判別模型和生成模型的優勢,通過創新的數據策略和大規模訓練,顯著提升了單目深度估計的精度和泛化能力。同時,新提出的DA-2K評估基準為未來研究提供了重要參考。
Bass Accompaniment Generation via Latent Diffusion
Paper • 2402.01412 • PublishedEvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Paper • 2406.13457 • Published • 16Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework
Paper • 2406.14783 • Published • 16DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Paper • 2406.16855 • Published • 54OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Paper • 2406.16772 • Published • 2Video-to-Audio Generation with Hidden Alignment
Paper • 2407.07464 • Published • 16GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 23Stable Audio Open
Paper • 2407.14358 • Published • 23Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
Paper • 2309.04814 • PublishedMusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
Paper • 2407.15060 • Published • 9LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels
Paper • 2407.18054 • Published • 10Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Paper • 2407.18129 • Published • 11The FIGNEWS Shared Task on News Media Narratives
Paper • 2407.18147 • Published • 8Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing
Paper • 2407.17722 • Published • 8DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction
Paper • 2407.16988 • Published • 7AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 32Matting by Generation
Paper • 2407.21017 • Published • 22Taming Data and Transformers for Audio Generation
Paper • 2406.19388 • PublishedOpen-Vocabulary Audio-Visual Semantic Segmentation
Paper • 2407.21721 • Published • 8Fast Sprite Decomposition from Animated Graphics
Paper • 2408.03923 • Published • 7SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
Paper • 2408.06697 • Published • 14