nyuuzyou's picture

nyuuzyou PRO

nyuuzyou

AI & ML interests

None yet

Recent Activity

Organizations

Social Post Explorers's profile picture AI Starter Pack's profile picture

nyuuzyou's activity

posted an update 1 day ago
view post
Post
867
πŸ¦… EagleSFT Dataset - nyuuzyou/EagleSFT

Collection of 536,231 question-answer pairs featuring:

- Human-posed questions and machine-generated responses for SFT
- Bilingual content in Russian and English with linked IDs
- Derived from 739k+ real user queries, primarily educational topics
- Includes unique IDs and machine-generated category labels

This dataset provides a resource for supervised fine-tuning (SFT) of large language models, cross-lingual research, and understanding model responses to diverse user prompts. Released to the public domain under CC0 1.0 license.
reacted to neph1's post with πŸ‘ 1 day ago
posted an update 2 days ago
view post
Post
712
🎨 Artgram Dataset - nyuuzyou/artgram

Collection of approximately 7,200 professional artworks and metadata featuring:
- High-quality artwork images from artist portfolios on artgram.co.
- Comprehensive metadata including project titles, artist details, descriptions, dates, and software used.
- Artwork images provided in ZIP archives (~1k images per archive) linked via an index CSV.
- Corresponding metadata available in a single JSONL file (7,185 entries).
- Monolingual dataset focused primarily on English (en) language content in descriptions and metadata.

This dataset offers a valuable resource for image classification, image-to-text generation, multimodal model training, and analysis of digital art.
replied to merve's post 2 days ago
view reply

Multimodal reasoning is the way to go, especially with open source, congrats to the Moonshot team

posted an update 5 days ago
view post
Post
5403
πŸ‡·πŸ‡Ί Russian Forum Messages Dataset - nyuuzyou/ruforum

Collection of approximately 58 million Russian forum messages featuring:

- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content

This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.
replied to piper2024's post 5 days ago
view reply

Hello, Hugging Face uses git (or backwards-compatible xet) for storing files. When you upload a new version of a file to Git, it doesn't overwrite the old file. Instead, Git stores both versions, with the new version becoming the current one while the old version remains accessible in your history, which is why repositories grow over time.

replied to piper2024's post 6 days ago
view reply

@piper2024 Yes. Open the repository you want to delete files from, go to settings and under β€œStorage Usage” click β€œList LFS files”. There you can select multiple files and delete them at once

replied to piper2024's post 6 days ago
replied to etemiz's post 7 days ago
view reply

It's long been my view that LMArena isn't a fully reliable measure of real-world LLM performance. I suspect many users might click somewhat randomly, perhaps favoring answers based on superficial qualities like length, formatting, or speed, rather than deeper assessment.

Since all the Arena dialogues are publicly available on Hugging Face, a crowdsourced evaluation system utilizing that data seems like it could be quite valuable. It would also be interesting to see more development in automated evaluation systems, perhaps along the lines of "Arena-Hard-Auto" (though keeping such systems updated and robust is a challenge). However, building an effective automated evaluator would likely require training a specialized model on a large corpus, because I'm fairly certain that using a current powerful model like GPT-4-Turbo (or any other) for evaluation would introduce bias, favoring responses that align with its own style.

reacted to Steven10429's post with πŸ‘€ 7 days ago
view post
Post
2780
I got rejected from llama4.
So that means I can use quantinized model without following their TOS.
Interesting.
Β·
reacted to jsulz's post with πŸ”₯ 9 days ago
view post
Post
2038
The Llama 4 release - meta-llama/llama-4-67f0c30d9fe03840bc9d0164 - was a big one for the xet-team with every model backed by the storage infrastructure of the future for the Hub.

It's been a wild few days, and especially 🀯 to see every tensor file with a Xet logo next to it instead of LFS.

The attached graph shows requests per second to our content-addressed store (CAS) right as the release went live.

yellow = GETs; dashed line = launch time.

You can definitely tell when the community started downloading πŸ‘€

h/t to @rajatarya for the graph, the entire Xet crew to bring us to this point, and special shoutout to Rajat, @port8080 , @brianronan , @seanses , and @znation who made sure the bytes kept flying all weekend ⚑️
  • 1 reply
Β·
reacted to DawnC's post with πŸ”₯ 9 days ago
view post
Post
2441
New in PawMatchAI🐾 : Turn Your Dog Photos into Art!

I’m excited to introduce a brand-new creative feature β€” Dog Style Transfer is now live on PawMatchAI!

Just upload your dog’s photo and transform it into 5 artistic styles:
🌸 Japanese Anime
πŸ“š Classic Cartoon
πŸ–ΌοΈ Oil Painting
🎨 Watercolor
πŸŒ† Cyberpunk

All powered by Stable Diffusion and enhanced with smart prompt tuning to preserve your dog’s unique traits and breed identity , so the artwork stays true to your furry friend.

Whether you're creating a custom portrait or just having fun, this feature brings your pet photos to life in completely new ways.

And here’s a little secret: although it’s designed with dogs in mind, it actually works on any photo β€” cats, plush toys, even humans. Feel free to experiment!

Results may not always be perfectly accurate, sometimes your photo might come back looking a little different, or even beyond your imagination. But that’s part of the fun! It’s all about creative surprises and letting the AI do its thing.

Try it now: DawnC/PawMatchAI

If this new feature made you smile, a ❀️ for this space would mean a lot.

#AIArt #StyleTransfer #StableDiffusion #ComputerVision #MachineLearning #DeepLearning
  • 2 replies
Β·
reacted to luigi12345's post with πŸ‘ 10 days ago
view post
Post
2706
πŸš€ Meta’s Llama 4 Models Now on Hugging Face!

Meta has released Llama 4 Scout and Llama 4 Maverick, now available on Hugging Face:
β€’ Llama 4 Scout: 17B active parameters, 16-expert Mixture of Experts (MoE) architecture, 10M token context window, fits on a single H100 GPU. οΏΌ
β€’ Llama 4 Maverick: 17B active parameters, 128-expert MoE architecture, 1M token context window, optimized for DGX H100 systems. οΏΌ

πŸ”₯ Key Features:
β€’ Native Multimodality: Seamlessly processes text and images. οΏΌ
β€’ Extended Context Window: Up to 10 million tokens for handling extensive inputs.
β€’ Multilingual Support: Trained on 200 languages, with fine-tuning support for 12, including Arabic, Spanish, and German. οΏΌ

πŸ› οΈ Access and Integration:
β€’ Model Checkpoints: Available under the meta-llama organization on the Hugging Face Hub.
β€’ Transformers Compatibility: Fully supported in transformers v4.51.0 for easy loading and fine-tuning.
β€’ Efficient Deployment: Supports tensor-parallelism and automatic device mapping.

These models offer developers enhanced capabilities for building sophisticated, multimodal AI applications. οΏΌ
reacted to jsulz's post with πŸ”₯ 12 days ago
view post
Post
3607
Huge week for xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.

Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.

We expect builders on the Hub to see even more improvements, helping power innovation across the community.

With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.

Thanks to the meta-llama team for launching on Xet!
reacted to merterbak's post with πŸ”₯ 12 days ago
view post
Post
2951
Meta has unveiled its Llama 4 πŸ¦™ family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
ModelsπŸ€—: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- πŸ” Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- πŸ“ Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

πŸ”Ή Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

πŸ”Ή Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

πŸ”Ή Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks
reacted to clem's post with πŸ”₯ 13 days ago
view post
Post
1931
Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March.

People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!

Kudos to all the small AI builders out there!
  • 2 replies
Β·
reacted to Reality123b's post with 🧠 13 days ago
view post
Post
528
Does anyone know how to convert a replit app into a huggingface spaces app?
reacted to ritvik77's post with πŸ”₯ 13 days ago
view post
Post
3225
Hi πŸ€—HF Community,

I would be incredibly grateful for an opportunity to contribute β€” in any capacity β€” and learn alongside researchers here. Is there any possibility I could collaborate or assist with any of your research works ?

I’m happy to support ongoing projects, contribute to data analysis, code, documentation, or anything that adds value.

Thank you for your time and consideration!

Warm regards,
Ritvik Gaur
  • 1 reply
Β·
reacted to AdinaY's post with πŸ”₯ 14 days ago
posted an update 14 days ago
view post
Post
1158
✈️ Thanks for the interest shown in the FlightAware Photos dataset ( nyuuzyou/flightaware). Seeing its potential, I'm working on expanding it to over 1 million images soon.

---

🎨 Introducing the PaintBerri Hand-Drawn Art Dataset - nyuuzyou/paintberri

A collection of 68,860 digital hand-drawn artworks featuring:

Unique images sourced directly from the paintberri.com online art community.
Rich metadata including creator-provided titles, descriptions, and timestamps.
Image dimensions, thumbnail URLs, and NSFW content flags.
Creator IDs (where available) and unique short identifiers for each piece.

This dataset offers a distinct visual archive capturing diverse styles and subjects from an active online drawing community, suitable for image classification and image-to-text tasks. Opt-out is available for creators wishing to remove their work.