Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

Building Lectūra AI | AI/ML Researcher | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Organizations

MLX Community's profile picture

Jaward's activity

posted an update 6 days ago
view post
Post
1144
bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDB🔥) which was meticulously curated using an infra-red device to track pen position.
Try demo here: https://www.calligrapher.ai/
Code: https://github.com/sjvasquez/handwriting-synthesis
posted an update 23 days ago
posted an update 25 days ago
view post
Post
1768
Huge Win Today 🎉🎉
Our team “Afri-Aya” just won this year’s CohereAI Aya Expedition. Our work focused on 1) curating and evaluating culturally relevant African vision dataset then 2) Fine-tuning the Aya vision model to support underrepresented languages in Africa. I represented my beloved Sierra Leone with the Krio language. Krio is a beautiful first language spoken by a majority of our population. It was a humbling and inspiring experience to have it recognized, thanks to the relentless effort of everyone on the team. Special thanks to BK for offering me this opportunity 🫡 and to Cohere AI for such an amazing global research expedition🙏
posted an update 29 days ago
view post
Post
425
Officially kicking off my startup today🎉
Join me in building the future of learning: Lectūra - an advanced multi-agent software for adaptive personalized learning experience. Research will focus on building tools that empower individual learners to master needed self-taught skills with the help of AI.
Read more: https://lecturalabs.com/
Feel free to reach out via the mentioned email and follow the official account for updates: https://x.com/lectura_ai

Curiosity has a voice, let it teach you. Generate Lectures. Customize Instructors. Get Real-time Personalized Learning.
posted an update about 1 month ago
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
679
Thrilled to share our latest work: Voila - a family of fully opensourced voice models for real-time autonomous convos and role-play, some of our major contributions include 🧵:
1) An End-to-End Full-Duplex Arch: that directly processes & handles simultaneous audio token streams from user to model and vice versa.
2) Voila-Tokenizer: A 100K-hour trained tokenizer with interleaved alignment (audio & text) that distills semantic/acoustic tokens via RVQ.
3) Text-Audio Interleaved Alignment: We leveraged a fine-grained alignment of text and audio tokens that allows synchronization and expressiveness for tasks like ASR (WER 2.7%) and TTS (WER 2.8%).
4) Voice Customization: Supports 1M+ pre-built voices and 1 shot voice clone from 10s audio clips using Wespeaker embeddings.

Models: maitrix-org/voila-67e0d96962c19f221fc73fa5
Code: https://github.com/maitrix-org/Voila
Demo: maitrix-org/Voila-demo
Project page: maitrix-org/Voila-demo
  • 2 replies
·
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
3119
Finally my first solo preprint is here:) a love letter to the field. Nothing much lol, this is just me trying to finetune my understanding of research behind the recent breakthroughs in reasoning models. It’s a preprint targeting beginners in the field - will eventually make necessary changes later. In the meantime have fun with it:)
Download: https://github.com/Jaykef/Jaykef/blob/main/papers/The-Dawn-of-Thinking-Machines.pdf
posted an update about 2 months ago
view post
Post
2254
New reasoning algo just dropped: Adaptive Parallel Reasoning
“we propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.”
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR
replied to their post about 2 months ago
posted an update about 2 months ago
posted an update about 2 months ago
view post
Post
1852
Funtime with SpatialLM- eventually it will serve well in embodied AI.
replied to their post 2 months ago
view reply

i noticed the models and code are not out yet, but they said they will release them shortly

posted an update 2 months ago
posted an update 2 months ago
view post
Post
1892
Implements from first-principle recently proposed dynamic tanh as alternative to layernorm. Specifically, we trained a nanoGPT (0.8 M params) on tiny shakespeare with conventional layernorm, RMSNorm and dynamic tanh, then compared performances. Observed performance seems to match or is stable for α = 0.5~ 1.5, might outperform if trained longer.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb
Background music by 周子珺
reacted to clem's post with 🚀 2 months ago
view post
Post
4048
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.

Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.

With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.

This is incredibly exciting. Let’s go, open science and open-source AI!
·
posted an update 3 months ago
posted an update 3 months ago
view post
Post
1775
Finally, the ground truth / AlexNet’s original source code is available to all.
Context: AlexNet had a historic win in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), reducing error rate from 26% (previous best) to 15.3%. It’s a deep CNN with 8 layers (5 convolutional + 3 fully connected), pioneering the use of ReLU activations for faster training, dropout for regularization, and GPU acceleration for large-scale learning. This moment marked the beginning of the deep learning revolution, inspiring architectures like VGG, ResNet, and modern transformers.
Code: https://github.com/computerhistory/AlexNet-Source-Code
posted an update 3 months ago
view post
Post
2132
Nvidia brings blue (from starwars droids) to life 🤯, supercute with flawless dexterity and droid voice. It's the result of their colab research with Google DeepMind and Disney, revealed as part of their new opensource physics engine for robotics simulation: NEWTON - which enables robots to learn how to complete complex tasks with greater precision.

ReadMore: https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation?ncid=so-twit-820797-vt48