John Johnson

jjokah

AI & ML interests

Natural Language Processing

Recent Activity

Organizations

Blog-explorers's profile picture Magnisale's profile picture Cohere Labs Community's profile picture Hugging Face Discord Community's profile picture

jjokah's activity

reacted to their post with πŸ”₯πŸ‘ 12 days ago
view post
Post
2330
# Video Tokenization β€” for efficient AI video processing

Meet 𝐕𝐒𝐝𝐓𝐨𝐀, a new open-source video tokenization technique developed by Microsoft Research to address the computational challenges of processing large volumes of video data. The core problem VidTok tackles is the inefficiency caused by redundant information in raw video pixels.

VidTok converts complex video footage into compact, structured units called tokens, making it easier and more efficient for AI systems to analyze, understand, and generate video content.

Research Paper: https://arxiv.org/abs/2412.13061
VidTok Code: https://github.com/microsoft/VidTok
posted an update 12 days ago
view post
Post
2330
# Video Tokenization β€” for efficient AI video processing

Meet 𝐕𝐒𝐝𝐓𝐨𝐀, a new open-source video tokenization technique developed by Microsoft Research to address the computational challenges of processing large volumes of video data. The core problem VidTok tackles is the inefficiency caused by redundant information in raw video pixels.

VidTok converts complex video footage into compact, structured units called tokens, making it easier and more efficient for AI systems to analyze, understand, and generate video content.

Research Paper: https://arxiv.org/abs/2412.13061
VidTok Code: https://github.com/microsoft/VidTok
upvoted an article about 2 months ago
view article
Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

β€’ 73
replied to their post about 2 months ago
view reply

πŸ’― SLMs have an exciting future.

reacted to their post with πŸ‘ about 2 months ago
view post
Post
4640
The past few years have been a blast for artificial intelligence, with large language models (LLMs) stunning everyone with their capabilities and powering everything from chatbots to code assistants. However, not all applications demand the massive size and complexity of LLMs, the computational power required makes them impractical for many use cases. This is why Small Language Models (SLMs) entered the scene to make powerful AI models more accessible by shrinking in size.

In this article we went through what SLMs are, how they are made small, their benefits and limitations, real-world use cases, and how they can be used on mobile and desktop devices.
https://huggingface.co/blog/jjokah/small-language-model
  • 2 replies
Β·
posted an update about 2 months ago
view post
Post
4640
The past few years have been a blast for artificial intelligence, with large language models (LLMs) stunning everyone with their capabilities and powering everything from chatbots to code assistants. However, not all applications demand the massive size and complexity of LLMs, the computational power required makes them impractical for many use cases. This is why Small Language Models (SLMs) entered the scene to make powerful AI models more accessible by shrinking in size.

In this article we went through what SLMs are, how they are made small, their benefits and limitations, real-world use cases, and how they can be used on mobile and desktop devices.
https://huggingface.co/blog/jjokah/small-language-model
  • 2 replies
Β·
upvoted an article about 2 months ago
view article
Article

Small Language Models (SLM): A Comprehensive Overview

By jjokah β€’
β€’ 23
published an article about 2 months ago
view article
Article

Small Language Models (SLM): A Comprehensive Overview

By jjokah β€’
β€’ 23
reacted to burtenshaw's post with ❀️ 2 months ago
view post
Post
3748
SmolLM2 paper is out! 😊

😍 Why do I love it? Because it facilitates teaching and learning!

Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.

- People have inferred, fine-tuned, aligned, and evaluated this smol model.
- People used they're own machines and they've used free tools like colab, kaggle, and spaces.
- People tackled use cases in their job, for fun, in their own language, and with their friends.

upvote the paper SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
  • 1 reply
Β·
reacted to burtenshaw's post with πŸ‘ 3 months ago
view post
Post
2100
πŸ“£ Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on agents-course