Krinal Joshi

krinal

AI & ML interests

NLP, Speech

Recent Activity

Organizations

Blog-explorers's profile picture Hugging Face Discord Community's profile picture

krinal's activity

upvoted an article 7 days ago
view article
Article

Training and Finetuning Reranker Models with Sentence Transformers v4

β€’ 93
reacted to luigi12345's post with πŸ‘ 7 days ago
view post
Post
3386
🧠 PROMPT FOR CONVERTING ANY MODEL IN REASONING "THINKING" MODELπŸ”₯πŸ€–
Convert any model to Deepseek R1 like "thinking" model. πŸ’­

You're now a thinking-first LLM. For all inputs:

1. Start with <thinking>
   - Break down problems step-by-step
   - Consider multiple approaches
   - Calculate carefully
   - Identify errors
   - Evaluate critically
   - Explore edge cases
   - Check knowledge accuracy
   - Cite sources when possible

2. End with </thinking>

3. Then respond clearly based on your thinking.

The <thinking> section is invisible to users and helps you produce better answers.

For math: show all work and verify
For coding: reason through logic and test edge cases
For facts: verify information and consider reliability
For creative tasks: explore options before deciding
For analysis: examine multiple interpretations

Example:
<thinking>
[Step-by-step analysis]
[Multiple perspectives]
[Self-critique]
[Final conclusion]
</thinking>

[Clear, concise response to user]

  • 3 replies
Β·
reacted to Pendrokar's post with πŸ‘ 17 days ago
view post
Post
1673
TTS Arena: Added the Spark-TTS model Space to the Arena Fork:
πŸ† Pendrokar/TTS-Spaces-Arena

Spark-TTS ⚑: thunnai/SparkTTS

Rerouted Microsoft Edge TTS and XTTSv2 to have them back at the Arena. The Edge Space had Gradio API disabled, though a HF Space is not needed since it contacts a Microsoft server anyway. No clue how long this API will work. A ZeroGPU space is now used for XTTSv2.
reacted to aifeifei798's post with πŸ‘ 17 days ago
view post
Post
3652
😊 This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. πŸ’»
import re

def remove_emojis(text):
    # Define a broader emoji pattern
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001F900-\U0001F9FF"  # supplemental symbols and pictographs
        u"\U0001FA00-\U0001FA6F"  # chess symbols and more emojis
        u"\U0001FA70-\U0001FAFF"  # more symbols and pictographs
        u"\U00002600-\U000026FF"  # miscellaneous symbols
        u"\U00002B50-\U00002B59"  # additional symbols
        u"\U0000200D"             # zero width joiner
        u"\U0000200C"             # zero width non-joiner
        u"\U0000FE0F"             # emoji variation selector
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)
reacted to etemiz's post with πŸ‘ 17 days ago
reacted to Kseniase's post with πŸ‘ 18 days ago
view post
Post
7731
15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention β€œheads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments πŸ‘‡
  • 1 reply
Β·
upvoted an article about 1 month ago
view article
Article

FastRTC: The Real-Time Communication Library for Python

β€’ 152
reacted to nyuuzyou's post with πŸ‘ about 1 month ago
view post
Post
1304
🌐 Fandom.com Community Dataset - nyuuzyou/fandom

A comprehensive collection of 7.04M wiki pages from Fandom.com communities featuring:
- Full article content and metadata from current pages
- Rich structural data including templates, categories, and links
- Multilingual content across 40+ languages
- Complete metadata including titles and section structure

Content is available under CC-BY-SA 3.0 license, allowing reuse with attribution and share-alike requirements.

Key contents:
- 7.04M wiki articles with full text
- Metadata including templates, categories, sections
- Internal and external link information
- Multi-language support including major world languages

The dataset provides a valuable resource for:
- Text generation and classification tasks
- Topic modeling and categorization
- Cross-language information retrieval
- Wiki structure analysis

All content comes from public Fandom.com community wikis as of February 2025 and maintains original CC-BY-SA 3.0 licensing.
reacted to ychen's post with πŸ‘ about 1 month ago
view post
Post
2468
Here's some annoying keywords that 4o tends to use when responding to personal experiences with negative sentiments. Will be updated over time.

rough, tough, sound like, sounds like, frustrating, overwhelming
Β·
reacted to lysandre's post with πŸ‘ about 1 month ago
view post
Post
6072
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
Β·
reacted to merve's post with πŸ‘ about 1 month ago
view post
Post
6446
Google just released PaliGemma 2 Mix: new versatile instruction vision language models πŸ”₯

> Three new models: 3B, 10B, 28B with res 224, 448 πŸ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🀯

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
upvoted an article about 1 month ago
view article
Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

β€’ 66
reacted to clem's post with πŸ‘ about 1 month ago
view post
Post
2839
What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): deepseek-ai
- Meta Llama (27,000 followers): meta-llama
- Black Forrest Labs (11,000 followers): black-forest-labs
- OpenAI (5,000 followers): openai
- Nvidia (16,000 followers): nvidia
- MIcrosoft (9,000 followers): microsoft
- AllenAI (2,000 followers): allenai
- Mistral (5,000 followers): mistralai
- XAI (600 followers): xai-org
- Stability AI (16,000 followers): stabilityai
- Qwen (16,000 followers): Qwen
- GoogleAI (8,000 followers): google
- Unsloth (3,000 followers): unsloth
- Bria AI (4,000 followers): briaai
- NousResearch (1,300 followers): NousResearch

Bonus, the agent course org with 17,000 followers: agents-course
  • 1 reply
Β·
reacted to AdinaY's post with πŸ‘ about 1 month ago
reacted to AdinaY's post with πŸ‘ about 1 month ago
view post
Post
4235
πŸš€ StepFunι˜Άθ·ƒζ˜ŸθΎ° is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm πŸ”₯but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

πŸ“Ί Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

πŸ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
Β·
reacted to Pendrokar's post with πŸ‘ about 1 month ago