nyuuzyou

nyuuzyou

AI & ML interests

None yet

Organizations

Posts 27

view post
Post
2087
🎵 Introducing Suno Music Generation Dataset - nyuuzyou/suno

Dataset highlights:

- 659,788 AI-generated music samples with comprehensive metadata from suno.com
- Multilingual content with English as primary language, including Japanese and other languages
- Each entry contains rich metadata including:
- Unique song ID, audio/video URLs, and thumbnail images
- AI model version and generation parameters
- Song metadata (tags, prompts, duration)
- Creator information and engagement metrics
- Released to the public domain under Creative Commons Zero (CC0) license

The dataset structure includes detailed information about each generated piece, from technical parameters to user engagement metrics, making it particularly valuable for:
- Music generation model training
- Cross-modal analysis (text-to-audio relationships)
- User engagement studies
- Audio classification tasks
- Music style and genre analysis
view post
Post
1404
🎓 Introducing Kompy.info Uzbek Educational Dataset - nyuuzyou/kompy

Dataset highlights:
- 584,648 pages of educational content extracted from kompy.info, a comprehensive educational resource website
- Content exclusively in Uzbek language, focusing on technical and scientific topics
- Each entry contains: URL, page title, and extracted main text content
- Data extracted using trafilatura HTML extraction tool
- Covers a wide range of academic and educational materials
- Released to the public domain under Creative Commons Zero (CC0) license

The dataset presents a valuable resource for natural language processing tasks in the Uzbek language, particularly in educational and technical domains. It can be used for text classification, topic modeling, and content analysis of educational materials. The large-scale collection of Uzbek-language academic content makes it especially useful for developing educational technology applications and studying pedagogical approaches in Uzbek-language instruction. The dataset's monolingual nature provides a focused corpus for understanding technical and scientific terminology in Uzbek educational contexts.