Jaward (Jaward Sesay)

posted an update 21 days ago

Post

4517

This is huge!
the opensource community is all in on open access to rl environments, PrimeIntellect you’re not alone.
Code: https://github.com/WooooDyy/AgentGym-RL

posted an update about 1 month ago

Post

6985

It’s absolutely mind blowing - the work Dynamics Lab is doing!!
With just a single input image and in a few seconds, their new world engine model (Mirage 2) can generate a whole new interactive world that’s physics informed and fully explorable in real-time🤯
Try it yourself: https://demo.dynamicslab.ai/chaos

1 reply

·

replied to their post about 2 months ago

you're welcome, nice work.

posted an update about 2 months ago

Post

4200

fascinating read!
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL

2 replies

·

posted an update 3 months ago

Post

3269

Towards batch sizes too small to meter🎉 beautiful work! And my personal favorite so far - I adore peak performance at small/nano scale. Everyone deserves to run/train AGI locally:) our data, our god model!
They showed that:
- you can train LLMs (upto 1B params) with as low as batch_size=1. This is unconventional given small batch sizes can lead to unstable/spiky training runs.
- you can have a stable train run with just vanilla SGD(stochastic gradient descent), no momentum required🤯
- small batch sizes are more robust to hyperparameters (i.e no worries with initialization)
- smaller batch sizes outperforms (“better per-Flops performance”) larger batch sizes.

“We recommend that practitioners training large models in memory-constrained settings exploit the benefits of small batch sizes rather than trying to emulate the large batch size setting (e.g., through gradient accumulation) typically used in industry.”

I’ve been doing this for ages - my mantra: all my experiments must scale on my 8gb ram m2 before moving to gpu. IOW I love being gpu poor. Checkout my nanoAI algo repo: https://github.com/Jaykef/ai-algorithms, all notebooks run on memory as low as 8gb ram

posted an update 3 months ago

Post

2071

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814

posted an update 3 months ago

Post

2340

Mind2Web 2 is out - this time featuring eval and benchmark for deep research🔥
Paper: Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge (2506.21506)
Project: https://osu-nlp-group.github.io/Mind2Web-2/

posted an update 3 months ago

Post

3484

Awesome intro to LLM course "Language Modeling from Scratch" by stanford. love the aesthetics behind the lecture notes, notes-in-code genius idea👍
Course site: https://stanford-cs336.github.io/spring2025/
Repo: https://github.com/stanford-cs336/spring2025-lectures
Videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_

2 replies

·

posted an update 4 months ago

Post

1463

not sure of what to make of this but solving autonomous/selective reflection seems like a big deal in current agent frameworks. We did hit on this with iterative self-refinement in our AutoAgents framework (https://ijcai.org/proceedings/2024/0003.pdf). Nice read, looking forward to the code.
Paper: Scaling Test-time Compute for LLM Agents (2506.12928)

replied to their post 4 months ago

will cook a deep dive tutorial on dfms sometime next week, the math is nolonger scary after taking this course:)
https://diffusion.csail.mit.edu/

posted an update 4 months ago

Post

1410

You can now edit operations with a discrete flow model, supercool👍! It's amazing to see the progress on DFM within one year since its introduction - literally my litmus test for how fast the field is progressing:
1st Introduced (2024): https://arxiv.org/abs/2402.04997
Discrete Flow Matching (2024): https://arxiv.org/abs/2407.15595
Edit Discrete Flow (2025): https://arxiv.org/pdf/2506.09018
Looking forward to a SaaS level reach like that of dLLMs e.g Mercury by inception labs 🚀

1 reply

·

posted an update 4 months ago

Post

1189

bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDB🔥) which was meticulously curated using an infra-red device to track pen position.
Try demo here: https://www.calligrapher.ai/
Code: https://github.com/sjvasquez/handwriting-synthesis

posted an update 5 months ago

Post

1908

I gave rectified flow a try, so here is nanoRF - a lightweight implementation of a Rectified Flow Transformer model, ~ 618k parameters, 6 layers deep, dim 64, patch size 4, learning rate 5e-4 trained on my 8bg ram m2 macbookair for 2k epochs.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanoRF.ipynb
See demo: https://x.com/Jaykef_/status/1923718725578129838
Reference Paper: https://arxiv.org/abs/2403.03206.

posted an update 5 months ago

Post

1797

Huge Win Today 🎉🎉
Our team “Afri-Aya” just won this year’s CohereAI Aya Expedition Challenge. Our work focused on 1) curating and evaluating culturally relevant African vision dataset then 2) Fine-tuning the Aya vision model to support underrepresented languages in Africa. I represented my beloved Sierra Leone with the Krio language. Krio is a beautiful first language spoken by a majority of our population. It was a humbling and inspiring experience to have it recognized, thanks to the relentless effort of everyone on the team. Special thanks to BK for offering me this opportunity 🫡 and to Cohere AI for such an amazing global research expedition🙏

posted an update 5 months ago

Post

440

Officially kicking off my startup today🎉
Join me in building the future of learning: Lectūra - an advanced multi-agent software for adaptive personalized learning experience. Research will focus on building tools that empower individual learners to master needed self-taught skills with the help of AI.
Read more: https://lecturalabs.com/
Feel free to reach out via the mentioned email and follow the official account for updates: https://x.com/lectura_ai

Curiosity has a voice, let it teach you. Generate Lectures. Customize Instructors. Get Real-time Personalized Learning.

posted an update 5 months ago

Post

3316

finally, a course that makes diffusion math much easier to grasp, well done 👍 https://diffusion.csail.mit.edu/

1 reply

·

replied to their post 5 months ago

if you like this work, kindly upvote the paper, thanks: https://huggingface.co/papers/2505.02707

posted an update 5 months ago

Post

694

Thrilled to share our latest work: Voila - a family of fully opensourced voice models for real-time autonomous convos and role-play, some of our major contributions include 🧵:
1) An End-to-End Full-Duplex Arch: that directly processes & handles simultaneous audio token streams from user to model and vice versa.
2) Voila-Tokenizer: A 100K-hour trained tokenizer with interleaved alignment (audio & text) that distills semantic/acoustic tokens via RVQ.
3) Text-Audio Interleaved Alignment: We leveraged a fine-grained alignment of text and audio tokens that allows synchronization and expressiveness for tasks like ASR (WER 2.7%) and TTS (WER 2.8%).
4) Voice Customization: Supports 1M+ pre-built voices and 1 shot voice clone from 10s audio clips using Wespeaker embeddings.

Models: maitrix-org/voila-67e0d96962c19f221fc73fa5
Code: https://github.com/maitrix-org/Voila
Demo: maitrix-org/Voila-demo
Project page: maitrix-org/Voila-demo

2 replies

·

posted an update 5 months ago

Post

1300

late submission but managed to cook up a nascent Feynman-inspired agent app for Microsoft’s AI Agent hackathon, wish me luck lol. @clem ps I need this on gpu, thank you:)
Try Demo: Jaward/Professor-AI-Feynman
Code: https://github.com/Jaykef/professor-ai-feynman

3 replies

·

posted an update 5 months ago

Post

3134

Finally my first solo preprint is here:) a love letter to the field. Nothing much lol, this is just me trying to finetune my understanding of research behind the recent breakthroughs in reasoning models. It’s a preprint targeting beginners in the field - will eventually make necessary changes later. In the meantime have fun with it:)
Download: https://github.com/Jaykef/Jaykef/blob/main/papers/The-Dawn-of-Thinking-Machines.pdf

Jaward Sesay

AI & ML interests

Recent Activity

Organizations

Jaward Sesay

AI & ML interests

Recent Activity

Organizations

Jaward's activity