dhruv3006 (Dhruv)

posted an update 3 days ago

Post

331

Cua: Best State-of-the-Art Computer-Use Agent

Build a SOTA Computer-Use Agent using Cua (https://github.com/trycua/cua), the open-source infrastructure and agent framework for controlling real desktop and browser environments.
Submissions are evaluated in HUD’s OSWorld-Verified benchmarking environment. The top-scoring team earns a secured interview with a Y Combinator partner for the next batch.

Prizes:
Guaranteed YC partner interview
Feature on the Cua blog + social channels
Swag pack for each team member

Eligibility: To be considered for judging and prizes, sign up at https://www.trycua.com/hackathon

posted an update 8 days ago

Post

235

Cua is hiring a Founding Engineer, UX & Design in SF

Cua is hiring a Founding Engineer, UX & Design in our brand new SF office.

Cua is building the infrastructure for general AI agents - your work will define how humans and computers interact at scale.

Location : SF

Referal Bonus : $5000

Apply here : https://www.ycombinator.com/companies/cua/jobs/a6UbTvG-founding-engineer-ux-design

Discord : https://discord.gg/vJ2uCgybsC

Github : https://github.com/trycua

reacted to ariG23498's post with 🧠 10 days ago

Post

794

I have always advocated for writing techinical stories without using LLMs.

The following one page editorial really drives the point home.
https://www.nature.com/articles/s44222-025-00323-4

reacted to their post with 👍🔥 10 days ago

Post

1789

Human in the Loop for computer use agents (instant handoff from AI to you)

Sometimes the best “agent” is you.

We’re introducing Human in the Loop: instantly hand off from automation to human control when a task needs judgment.

Yesterday we shared our HUD evals for measuring agents at scale. Today you can become the agent when it matters take over the same session see what the agent sees and keep the workflow moving.

Lets you create clean training demos, establish ground truth for tricky cases, intervene on edge cases ( CAPTCHAs, ambiguous UIs) or step through debug without context switching.

You have full human control when you want.We even a fallback version where in it starts automated but escalate to a human only when needed.

Works across common stacks (OpenAI, Anthropic, Hugging Face) and with our Composite Agents. Same tools, same environment take control when needed.

Feedback welcome,curious how you’d use this in your workflows.

Blog : https://www.trycua.com/blog/human-in-the-loop.md

Github : https://github.com/trycua/cua

posted an update 10 days ago

Post

1789

Human in the Loop for computer use agents (instant handoff from AI to you)

Sometimes the best “agent” is you.

We’re introducing Human in the Loop: instantly hand off from automation to human control when a task needs judgment.

Yesterday we shared our HUD evals for measuring agents at scale. Today you can become the agent when it matters take over the same session see what the agent sees and keep the workflow moving.

Lets you create clean training demos, establish ground truth for tricky cases, intervene on edge cases ( CAPTCHAs, ambiguous UIs) or step through debug without context switching.

You have full human control when you want.We even a fallback version where in it starts automated but escalate to a human only when needed.

Works across common stacks (OpenAI, Anthropic, Hugging Face) and with our Composite Agents. Same tools, same environment take control when needed.

Feedback welcome,curious how you’d use this in your workflows.

Blog : https://www.trycua.com/blog/human-in-the-loop.md

Github : https://github.com/trycua/cua

reacted to their post with 🚀🔥 12 days ago

Post

1389

Pair a vision grounding model with a reasoning LLM with Cua

Cua just shipped v0.4 of the Cua Agent framework with Composite Agents - you can now pair a vision/grounding model with a reasoning LLM using a simple modelA+modelB syntax. Best clicks + best plans.

The problem: every GUI model speaks a different dialect.
• some want pixel coordinates
• others want percentages
• a few spit out cursed tokens like <|loc095|>

We built a universal interface that works the same across Anthropic, OpenAI, Hugging Face, etc.:

agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)

But here’s the fun part: you can combine models by specialization.
Grounding model (sees + clicks) + Planning model (reasons + decides) →

agent = ComputerAgent(
model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-4o",
tools=[computer]
)

This gives GUI skills to models that were never built for computer use. One handles the eyes/hands, the other the brain. Think driver + navigator working together.

Two specialists beat one generalist. We’ve got a ready-to-run notebook demo - curious what combos you all will try.

Github : https://github.com/trycua/cua

Blog : https://www.trycua.com/blog/composite-agents

reacted to openfree's post with 🔥👍 12 days ago

Post

5699

🔒 Ansim Blur: Privacy-First Face Blurring for the AI Era

🚨 The Privacy Crisis is Now
Smart CCTVs 📹, delivery robots 🤖, and autonomous vehicles 🚗 are everywhere. Your face is being captured, transmitted, and stored without your knowledge or consent.

openfree/Face-blurring

The privacy threat is real:
24/7 surveillance cameras recording your every move
Companies harvesting facial biometric data at scale
Your face becoming a commodity without your permission

💡 The Solution: Ansim Blur
Real-time face anonymization powered by YOLOv8 🎯
✅ Process images, videos, and live streams
✅ Automatic GPU/CPU detection for universal deployment
✅ Choose between Gaussian blur or mosaic pixelation
✅ Fine-tune detection sensitivity for your needs
✅ Preserve audio tracks in video processing
🛡️ Real-World Applications
Enterprise Use Cases

Privacy compliance for robotics and drone footage
CCTV feed anonymization for regulatory requirements
Customer data protection in retail analytics

Personal Protection

Anonymize bystanders before sharing content online
Protect family members' privacy in shared videos
Avoid portrait rights issues in content creation

📊 Technical Specifications

Model: YOLOv8-face (optimized variant)
Performance: 30fps real-time processing on RTX 3060
Accuracy: 95%+ face detection rate
Formats: JPG, PNG, MP4, AVI, MOV

🌍 Why This Matters
"Face blurring will become mandatory for all public-facing cameras"
With GDPR in Europe, CCPA in California, and similar regulations worldwide, biometric data protection is becoming non-negotiable. Soon, every camera-equipped system will require built-in face anonymization capabilities.
🤝 Join the Movement
Why open source?
Because privacy isn't a premium feature—it's a fundamental right.

As technology advances, so must our commitment to privacy protection 🛡️

2 replies

·

posted an update 12 days ago

Post

1389

Pair a vision grounding model with a reasoning LLM with Cua

Cua just shipped v0.4 of the Cua Agent framework with Composite Agents - you can now pair a vision/grounding model with a reasoning LLM using a simple modelA+modelB syntax. Best clicks + best plans.

The problem: every GUI model speaks a different dialect.
• some want pixel coordinates
• others want percentages
• a few spit out cursed tokens like <|loc095|>

We built a universal interface that works the same across Anthropic, OpenAI, Hugging Face, etc.:

agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)

But here’s the fun part: you can combine models by specialization.
Grounding model (sees + clicks) + Planning model (reasons + decides) →

agent = ComputerAgent(
model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-4o",
tools=[computer]
)

This gives GUI skills to models that were never built for computer use. One handles the eyes/hands, the other the brain. Think driver + navigator working together.

Two specialists beat one generalist. We’ve got a ready-to-run notebook demo - curious what combos you all will try.

Github : https://github.com/trycua/cua

Blog : https://www.trycua.com/blog/composite-agents

posted an update 14 days ago

Post

235

Computer-Use Agents SOTA Challenge @ Hack the North (YC interview for top team) + Global Online ($2000 prize)

We’re bringing something new to Hack the North, Canada’s largest hackathon, this year: a head-to-head competition for Computer-Use Agents - on-site at Waterloo and a Global online challenge. From September 12–14, 2025, teams build on the Cua Agent Framework and are scored in HUD’s OSWorld-Verified environment to push past today’s SOTA on OS-World.

On-site (Track A)
Build during the weekend and submit a repo with a one-line start command. HUD executes your command in a clean environment and runs OSWorld-Verified. Scores come from official benchmark results; ties break by median, then wall-clock time, then earliest submission. Any model setup is allowed (cloud or local). Provide temporary credentials if needed.

HUD runs official evaluations immediately after submission. Winners are announced at the closing ceremony.

Deadline: Sept 15, 8:00 AM EDT

Global Online (Track B)
Open to anyone, anywhere. Build on your own timeline and submit a repo using Cua + Ollama/Ollama Cloud with a short write-up (what's local or hybrid about your design). Judged by Cua and Ollama teams on: Creativity (30%), Technical depth (30%), Use of Ollama/Cloud (30%), Polish (10%). A ≤2-min demo video helps but isn't required.

Deadline: Sept 22, 8:00 AM EDT (1 week after Hack the North)

Submission & rules (both tracks)
Deadlines: Sept 15, 8:00 AM EDT (Track A) / Sept 22, 8:00 AM EDT (Track B)
Deliverables: repo + README start command; optional short demo video; brief model/tool notes
Where to submit: links shared in the Hack the North portal and Discord
Commit freeze: we evaluate the submitted SHA
Rules: no human-in-the-loop after the start command; internet/model access allowed if declared; use temporary/test credentials; you keep your IP; by submitting, you allow benchmarking and publication of scores/short summaries.
Github : https://github.com/trycua

reacted to ProCreations's post with 👍😎 15 days ago

Post

4509

If @clem comments on this post within the week with a task for me to classify with text and a paramater size for the model, within 48 hours I will create a new dataset, train the model, and post it all. Paramater size must be bellow 50 million params, and task can be text only, and genuinely possible.

if I complete it, Clem must tell everyone on Twitter/x to follow me on huggingface and link it

let's see if he comments

1 reply

·

replied to their post 15 days ago

we tried glm 4.5.
here : https://x.com/trycua/status/1955319138005512596

Come to discord : https://discord.gg/9CxB5dpN

reacted to their post with 😎❤️👀 29 days ago

Post

1977

GPT 5 for Computer Use agents.

Same tasks, same grounding model we just swapped GPT 4o with GPT 5 as the thinking model.

Left = 4o, right = 5.

Watch GPT 5 pull away.

Reasoning model: OpenAI GPT-5

Grounding model: Salesforce GTA1-7B

Action space: CUA Cloud Instances (macOS/Linux/Windows)

The task is: "Navigate to {random_url} and play the game until you reach a score of 5/5”....each task is set up by having claude generate a random app from a predefined list of prompts (multiple choice trivia, form filling, or color matching)"

Try it yourself here : https://github.com/trycua/cua

Docs : https://docs.trycua.com/docs/agent-sdk/supported-agents/composed-agents

2 replies

·

posted an update about 1 month ago

Post

1977

GPT 5 for Computer Use agents.

Same tasks, same grounding model we just swapped GPT 4o with GPT 5 as the thinking model.

Left = 4o, right = 5.

Watch GPT 5 pull away.

Reasoning model: OpenAI GPT-5

Grounding model: Salesforce GTA1-7B

Action space: CUA Cloud Instances (macOS/Linux/Windows)

The task is: "Navigate to {random_url} and play the game until you reach a score of 5/5”....each task is set up by having claude generate a random app from a predefined list of prompts (multiple choice trivia, form filling, or color matching)"

Try it yourself here : https://github.com/trycua/cua

Docs : https://docs.trycua.com/docs/agent-sdk/supported-agents/composed-agents

2 replies

·

posted an update about 1 month ago

Post

176

So OpenAI is releasing a model on hf today???

Dhruv PRO

AI & ML interests

Recent Activity

Organizations

Dhruv PRO

AI & ML interests

Recent Activity

Organizations

dhruv3006's activity