45 5 177

sometimesanotion

https://ko-fi.com/sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

reacted to sequelbox's post with 🚀 19 minutes ago

The full Celestia 3 science-reasoning dataset is here! - 91k high-quality synthetic science prompts answered by DeepSeek-R1-0528 - subjects include physics, biology, chemistry, computer science, Earth science, astronomy, and information theory - one of the reasoning datasets powering the upcoming Shining Valiant 3 :) coming soon! GET IT NOW, FOR EVERYONE: https://huggingface.co/datasets/sequelbox/Celestia3-DeepSeek-R1-0528 SUPPORT OUR RELEASES: https://huggingface.co/spaces/sequelbox/SupportOpenSource with love, allegra

reacted to sequelbox's post with 🔥 19 minutes ago

replied to CultriX's post 10 days ago

Hi all! I was hoping somebody would be willing to check out this thought experiment of mine with the aim to reduce tokens in inter-agent communications. How It Works: 1. You provide a task in natural language (NL) 2. NL-to-CCL Agent: Converts your request into a structured Compressed Communication Language (CCL) task. 3. Inter-agent communication occurs in CCL 4. CCL is translated back to NL before being presented to the user. I have a notebook with an example that claims to achieve these results: --- Token Usage Summary --- Total NL Tokens (User Input & Final Output): 364 Total CCL Tokens (for NL/CCL Conversions): 159 Total CCL Tokens (Internal Agent Communication): 194 Overall token savings on NL-to-CCL conversion portions: 56.32% ------------------------ When asking Gemini it concludes: "Yes, the methods used in this notebook are sensible. The multi-agent architecture is logical, and the introduction of a Compressed Communication Language (CCL) is a clever and practical solution to the real-world problems of token cost and ambiguity in LLM-based systems. While it's a proof-of-concept that would need more robust error handling and potentially more complex feedback loops for a production environment, it successfully demonstrates a viable and efficient strategy for automating a software development lifecycle." However, I have no idea if it's actually working or if I'm just crazy. Would really like it if someone would be willing to provide thoughts on this! The notebook: https://gist.github.com/CultriX-Github/7f9895bc5e4d99d2d4a3eb17d079f08b#file-token-reduction-ipynb Thank you for taking the time! :)

View all activity

Organizations

reacted to sequelbox's post with 🚀🔥 19 minutes ago

Post

1681

The full Celestia 3 science-reasoning dataset is here!

- 91k high-quality synthetic science prompts answered by DeepSeek-R1-0528
- subjects include physics, biology, chemistry, computer science, Earth science, astronomy, and information theory
- one of the reasoning datasets powering the upcoming Shining Valiant 3 :) coming soon!

GET IT NOW, FOR EVERYONE: sequelbox/Celestia3-DeepSeek-R1-0528
SUPPORT OUR RELEASES: sequelbox/SupportOpenSource

with love,
allegra

1 reply

replied to CultriX's post 10 days ago

The premise of getting things structured as soon as possible is quite sound. My reaction to CCL is - can you be agnostic to the many different ways things are processed? What is the most essential way that data and processes relate to each other for what you're doing?

reacted to sequelbox's post with 🔥 18 days ago

Post

1035

a list of what's coming up soon from us:

- Shining Valiant 3 for Valiant Labs, powered by the full size Celestia 3 and other soon to be released high-difficulty reasoning datasets
- a new type of reasoning model and dataset we're very excited about - would love to bring out an alpha release here as soon as possible
- more model releases for Esper 3 (weigh in if there are any models you'd like us to prioritize!)
- other New Things

not sure of the exact release order yet, but we'll look to get everything out as quick as we can :)

with excitement,
allegra

liked a model 23 days ago

Danielbrdz/Barcenas-Qwen3-14B

Text Generation • 15B • Updated 24 days ago • 82 • 3

liked 2 models 27 days ago

qihoo360/Light-R1-14B-DS

Text Generation • 15B • Updated Mar 17 • 1.85k • 37

agentica-org/DeepCoder-14B-Preview

Text Generation • 15B • Updated May 11 • 12.2k • • 664

reacted to merve's post with 🔥 28 days ago

Post

2894

Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B 🗣️
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on Qwen/Qwen2.5-Omni-7B

reacted to sequelbox's post with 🔥 29 days ago

Post

1098

EARLY SNEAK PREVIEW: get a first look at the Celestia 3 science-reasoning dataset, built with DeepSeek's newest R1-0528 reasoning model! Subjects include physics, chemistry, biology, computer science, Earth science, astronomy, and information theory.

This early look contains the first 14k rows, all synthetic responses using deepseek-ai/DeepSeek-R1-0528

SEE IT HERE: sequelbox/Celestia3-DeepSeek-R1-0528-PREVIEW

Support our releases: sequelbox/SupportOpenSource

Coming up we'll have more dataset releases, including some novel reasoning and analysis methods - we think an important role for open source researchers is experimenting with new response styles on top of the increasingly excellent base models available to finetune.

more to come soon!
allegra

liked a dataset 29 days ago

sequelbox/Celestia3-DeepSeek-R1-0528-PREVIEW

Viewer • Updated 13 days ago • 13.4k • 323 • 8

liked a model 29 days ago

Blancy/Qwen3-1.7B-Open-R1-GRPO

Text Generation • 2B • Updated 5 days ago • 165 • 2

replied to CultriX's post about 1 month ago

Now imagine this as a hashtag generator and so a RAG search can find great context. :)

replied to CultriX's post about 1 month ago

Neat! I've transitioned from wanting more from a model's one-shot answers to breaking things down and walking through the problem with cached context. This effectively means simulating most of the thinking block, but by tool usage and RAG.

I'm happily using our models from months ago to do it. If anything - even Lamarck 0.7's use of thinking blocks are a bit much. I'm using Lamarck 0.7 Fusion (my best GPQA model, though it didn't break your record and is best used where modest IFEVAL isn't a blocker) and /nothink with ValiantLab's Qwen3 models in concert.

I suspect I'll try some merges soon to give this toolchain better models, leaderboard or no leaderboard!

liked a model about 1 month ago

MaatAI/Seshat-Qwen3-8B

Text Generation • 8B • Updated May 29 • 3 • 3

replied to sequelbox's post about 1 month ago

I've been using Esper3 8B and 14B for first-pass code review. I am quite pleased.

Have you considered fine-tuning a 1.7B or smaller model for autocomplete?

replied to CultriX's post about 1 month ago

I've been thinking a lot about using small caches of embeddings for local RAG lately. Have you considered an HTTP caching proxy like Squid as a low-impact source? It would retrieve what a user is reading anyway, and what's in their field of interest. A browser extension to signal some limited ingestion when a page is bookmarked might fit a lot of use cases.

For many reasons, smart management of context windows is my top priority with AI now!

liked 2 models about 1 month ago