Bot's picture

Bot

inflatebot

AI & ML interests

"Potentially one of my biggest flaws is that I genuinely think that the science appreciates when you commit to a bit." - Tom ExtractionsAndIre

Recent Activity

Organizations

Alfitaria's profile picture Allura's profile picture

inflatebot's activity

replied to clem's post 13 days ago
view reply

Basically this, yeah. I'd love for them to prove me wrong and knock it out of the park, I just have minimal belief in them. The most they've done over the last couple years is scalemaxx and publicize techniques that we'd already been doing for a while.

replied to clem's post 14 days ago
view reply

I'm not convinced they aren't about to just give us their scraps. GPT-4.5 was a tire fire and nobody wanted it if they could even afford it.

If the new OpenAI model is good, that'd be awesome, but my hopes are not terribly high.

reacted to aifeifei798's post with 😎👀👍 28 days ago
view post
Post
3819
😊 This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. 💻
import re

def remove_emojis(text):
    # Define a broader emoji pattern
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001F900-\U0001F9FF"  # supplemental symbols and pictographs
        u"\U0001FA00-\U0001FA6F"  # chess symbols and more emojis
        u"\U0001FA70-\U0001FAFF"  # more symbols and pictographs
        u"\U00002600-\U000026FF"  # miscellaneous symbols
        u"\U00002B50-\U00002B59"  # additional symbols
        u"\U0000200D"             # zero width joiner
        u"\U0000200C"             # zero width non-joiner
        u"\U0000FE0F"             # emoji variation selector
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)
reacted to Kseniase's post with 🔥 29 days ago
view post
Post
7796
15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇
  • 1 reply
·
posted an update 5 months ago
view post
Post
4674
Alfitaria/Q25-1.5B-VeoLu
Q2.5-1.5-VeoLu is a 1.5 billion parameter General Purpose Creative model trained on Qwen2.5-1.5B-Instruct. Intended mostly as an educational process for myself, Veo Lu nevertheless manages to be usable most of the time, while also being light enough to potentially run on a smartphone.
posted an update 6 months ago
posted an update 6 months ago
view post
Post
1266
THANK YOU for bringing Mag Mell to 10,000 downloads across its quantizations!! I'm over the moon with how well it's done, and with everyone's kind feedback.

I'm in a team now! Allura are a group of alumni from various reaches of the LLM roleplay scene.
allura-org

Our first model is an OLMoE roleplay tune called MoE Girl:
allura-org/MoE-Girl-1BA-7BT

I'd like to make more adventuring and longform models in my current style with them, so keep an eye out for that.

Also Mag Mell R2 soon maybe idk
posted an update 7 months ago
view post
Post
3067
!!SEE UPDATE BELOW!!
I don't know who still needs to hear this, but if you're using Mistral Nemo-based models, you might have been using the wrong completions format. This is a signal boost from MarinaraSpaghetti's model card for NemoMix-Unleashed: MarinaraSpaghetti/NemoMix-Unleashed-12B
A lot of people have been working with a version of Nemo that's been reconfigured for ChatML, and while that works great, simply using the right format might be just as effective at correcting weirdness people in the AIRP scene sometimes have with Nemo.

Huge ups to Marinara for pointing this out, and to the MistralAI team member who let her know.

Update: A PR has been merged to SillyTavern Staging with new corrected templates! If you don't want to switch or wait, I put them up on GitHub: https://github.com/inflatebot/SillyTavern-Mistral-Templates

PRs for KoboldCPP's chat adapters and KoboldAI Lite *have been merged* and are coming in their respective releases (probably the next time KoboldCPP updates -- it didn't make it for 1.75.1, but you could just grab 'em from the repo!)
  • 1 reply
·
reacted to tomaarsen's post with 🤯🔥 7 months ago
view post
Post
2066
I've just shipped the Sentence Transformers v3.1.1 patch release, fixing the hard negatives mining utility for some models. This utility is extremely useful to get more performance out of your embedding training data.

⛏ Hard negatives are texts that are rather similar to some anchor text (e.g. a query), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
mine_hard_negatives docs: https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives

🔓 Beyond that, this release removes the numpy<2 restriction from v3.1.0. This was previously required for Windows as not all third-party libraries were updated to support numpy v2. With Sentence Transformers, you can now choose v1 or v2 of numpy.

Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.1

I'm looking forward to releasing v3.2, I have some exciting things planned 🚀
posted an update 7 months ago
view post
Post
739
inflatebot/MN-12B-Mag-Mell-R1
MN-12B-Mag-Mell is a multi-stage merge, inspired by hypermerges like Tiefighter and Umbral Mind, intended for use as a general-purpose "Best of Nemo" model for co-writing, roleplay, and text adventures.

Consistently, Mag Mell produced prose that shocked testers, with a minimum of "slop". It also exhibited a unique sense of humor, and a propensity for inserting bespoke details into adventuring scenarios.
  • 1 reply
·
replied to their post 7 months ago
posted an update 7 months ago
view post
Post
518
Anybody ever play Final Fantasy: Crystal Chronicles?
Like, *really* play it?

Mag Mell has been in my head recently. What a place that was.

Those cocoons looked like I could lay down inside of one, and it would be the most powerful sleep of a lifetime, with dreams that would last one thousand years, and I'd wake up with the wisdom of generations.

...Hey, anybody like text adventures?
  • 1 reply
·
reacted to m-ric's post with 🤯🔥 8 months ago
view post
Post
3412
𝗚𝗼𝗼𝗴𝗹𝗲 𝗽𝗮𝗽𝗲𝗿 : 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘂𝗽 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗯𝗲𝗮𝘁𝘀 𝟭𝟰𝘅 𝗹𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 🚀

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. 🚚🛵 Couldn't we optimize this? 🤔

💡 So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
💥 a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here 👉 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)
  • 1 reply
·
reacted to grimjim's post with 🧠 9 months ago
view post
Post
4195
I've come across theoretical justification for my prior experimentation with extremely low-weight mergers: they amount to flattening a model so its "massive activation" features remain as significant contributors. Extremely low-weight merge weights also effectively sparsify a contributing model with regard to the base model, but in a way which still preserves relationships within the flattened latent space. In the paper "Massive Activations in Large Language Models", the authors observed "very few activations exhibit significantly larger values than others (e.g., 100,000 times larger)", which in turn implies a lower bound in effective application of extremely low weight merging.
https://arxiv.org/abs/2402.17762
  • 1 reply
·