Jiuxiang Gu

JoshuaGu

AI & ML interests

None yet

Recent Activity

upvoted a paper 12 days ago

MMGR: Multi-Modal Generative Reasoning

authored a paper 3 months ago

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

updated a dataset 6 months ago

adopd/adopd2024

View all activity

Organizations

upvoted a paper 12 days ago

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published 13 days ago • 114

authored a paper 3 months ago

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

Paper • 2509.19244 • Published Sep 23 • 11

updated a dataset 6 months ago

adopd/adopd2024

Viewer • Updated Jul 13 • 120k • 59 • 3

authored a paper 9 months ago

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published Apr 7 • 17

upvoted a paper about 1 year ago

Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29, 2024 • 33

authored a paper about 1 year ago

Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29, 2024 • 33

updated a dataset about 1 year ago

opioidarchive/oida-qa

Viewer • Updated Nov 17 • 400k • 803 • 1

liked a dataset over 1 year ago

adopd/adopd2024

Viewer • Updated Jul 13 • 120k • 59 • 3

reacted to Molbap's post with 🔥 over 1 year ago

Post

5523

🚀🚀 Exciting times for the document AI community!

We're thrilled to announce the release of some of the largest OCR datasets available to the public.
🔥 With over 26 million pages , 18 billion text tokens, and 6TB of data, these resources are a significant leap forward for document AI research.

Here's how to access these datasets quickly:

from datasets import load_dataset

pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)

This enables you to stream them directly, integrating seamlessly with your projects using the Hugging Face datasets library. On the hub, you can find them here:

pixparse/pdfa-eng-wds
pixparse/idl-wds

For lean data loading, the new [chug](https://github.com/huggingface/chug) library offers a solution with pdf decoding:

import chug

task_cfg = chug.DataTaskDocReadCfg(
    page_sampling='all',
)
data_cfg = chug.DataCfg(
    source='pixparse/pdfa-eng-wds',
    split='train',
    batch_size=None,
    format='hfids',
    num_workers=0,
)
data_loader = chug.create_loader(
    data_cfg,
    task_cfg,
)
sample = next(iter(data_loader))

We owe a huge thank you to Peter Wyatt, Kate Tasker, Rachel Taketa, Ali Furkan Biten, Ruben Tito, and their colleagues for their contributions. Their work putting these datasets together has been invaluable. 🤗

Looking Ahead:

We're on a mission to enhance document AI capabilities, and these datasets are just the beginning. With your engagement and innovation, we're confident in the community's ability to develop robust OCR solutions. We encourage you to explore these datasets, experiment with the code, and contribute to the collective progress in document AI.

For detailed information on usage and licensing, please refer to the dataset cards on the Hugging Face hub.

4 replies

reacted to Jaward's post with 🔥 over 1 year ago

Post

4528

This is the closest I’ve seen of a scalable AI/LLM Operating System - it has all the major ingredients of a feasible AI OS 1 architecture:

- Extends classical OS functionalities with an LLM Kernel.
- Multi agent-centric approach.
- Optimized resource allocation system that allows for LLM-based tasks and Classical OS tasks to coexist.
- An Agent Scheduler that can perform classical os operations (FIFO, RR).
- A Context Manager to improve alignment.
- Lazy Memory Manager for agents (ensures data is stored and accessible only while the agent is active)
- An Enhanced security module for the AI-driven environment.

It does hit all checkpoints, doesn’t it? An upscale version of @karpathy ’s.

Code: https://github.com/agiresearch/AIOS