Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
gary109 's Collections
DeepSeek
video segmentation
Generation 3D
Text-to-Audio
LLM
Prompting
Text-to-Image
Representations
Transformers
Robot
Vision Transformers
Diffusion Model
text-to-3D
Text-to-Video
ML
RLHF
Video 優化
Image Completion
Others
multimodal
Auto
Vision-Language
Application
Optimization
Cost
Semantic Segmentation
Video Generation
Code Generation
ASR
Generative
Whisper
AGI
Funny
music
SVC
Datasets
yolo
Watermarking
生成式AI導論 2024
Text-to-Embedding
RAG
image-to-3D
Music Captions
OCR
Audio

Audio

updated Sep 25, 2024
Upvote
-

  • PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

    Paper • 2408.07547 • Published Aug 14, 2024 • 8

  • DeepSpeak Dataset v1.0

    Paper • 2408.05366 • Published Aug 9, 2024 • 14

  • Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

    Paper • 2408.15998 • Published Aug 28, 2024 • 88

  • Zero-shot Cross-lingual Voice Transfer for TTS

    Paper • 2409.13910 • Published Sep 20, 2024 • 10
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs