Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it

ProCreations/Mistake-To-Meaning

replied to ZennyKenny's post 2 months ago

nice

reacted to daavoo's post with 👍 2 months ago

Post

1940

We have just released a new version of⭐https://github.com/mozilla-ai/any-agent ⭐exposing an API to be used in async contexts:

import asyncio
from any_agent import AgentConfig, AnyAgent, TracingConfig
from any_agent.tools import search_web

async def main():
    agent = await AnyAgent.create_async(
        "openai",
        AgentConfig(
            model_id="gpt-4.1-mini",
            instructions="You are the main agent. Use the other available agents to find an answer",
        ),
        managed_agents=[
            AgentConfig(
                name="search_web_agent",
                description="An agent that can search the web",
                model_id="gpt-4.1-nano",
                tools=[search_web]
            )
        ],
        tracing=TracingConfig()
    )

    await agent.run_async("Which Agent Framework is the best??")

if __name__ == "__main__":
    asyncio.run(main())

reacted to merve's post with 🔥 2 months ago

Post

4030

Don't sleep on new AI at Meta Vision-Language release! 🔥

facebook/perception-encoder-67f977c9a65ca5895a7f6ba1
facebook/perception-lm-67f9783f171948c383ee7498

Meta dropped swiss army knives for vision with A2.0 license 👏
> image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏
> The vision LM outperforms InternVL3 and Qwen2.5VL 👏
> They also release gigantic video and image datasets

The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.

They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏

> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮

> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)

The authors release the following checkpoints in sizes base, large and giant:

> 3 PE-Core checkpoints (224, 336, 448)
> 2 PE-Lang checkpoints (L, G)
> One PE-Spatial (G, 448)
> 3 PLM (1B, 3B, 8B)
> Datasets

Authors release following datasets 📑
> PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️
> PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks
> PLM-VideoBench: New video benchmark on MCQA

2 replies

ikki

AI & ML interests

Recent Activity

Organizations

ikkitr's activity

OmniConsistency

WhiStress Demo

README

RF-DETR

YOLO ARENA