FineWeb2 Edu Japanese: A high-quality, filtered Japanese dataset (120M texts, 89.3B tokens) for educational AI training.
Yuichi Tateno PRO
hotchpotch
AI & ML interests
Information Retrieval with LLMs
Recent Activity
upvoted
an
article
about 20 hours ago
Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders
upvoted
an
article
2 days ago
Migrating the Hub from Git LFS to Xet