BangumiBase (BangumiBase)

narugo1992

updated a Space about 4 hours ago

README

🌖

AbstractPhil

posted an update 4 days ago

Post

219

Cardinality cardinality CARDINALITY! As I restructure the wordnet's multi-definition structure, I've found a fair assessment capability that minimizes column recall requirement while simultaneously maximizing recall speed. So it will be fast.
Research shows, the most intelligent and most intellectually-driven LLMs require the most intelligent and carefully curated solid representative vocabularies - with the most intelligent and carefully curated training regiments.
Class simultaneously loaded hierarchical structures built with variants of vocabulary dimensions do not help this. Multiple dimensions of imagenet do not help this. Reshaping does not help. Solidification processes through pulverizing using Alucard do not help - though they did show some interesting potentials for pretraining the full geometric clip from the ground floor.
The experimentations with the multitude of clip features and imagenet - showcase that not only can this tiny 4meg classification tool can handle imagenet from clip features AT AROUND 76% no matter the hyperparams using linear, but expanding this system upward and including hundreds of different formula variants DOES NOT HELP SCALE IT AT ALL! The largest ones only house 76%, and the medium-sized ones house about 86% instead of 76% when using clip-vit-b-patch16 and clip-vit-b-patch32. If you check the big number valuations for the clip-vit-b laion and openai, you'll find nearly identical classifications.
So I only taught it, to understand geometry - the more training and more steps only brings it closer incorrectly.
So, this tells me one simple principle; geometry and linear have an upward capacity based on the information extracted from the linear model. Meaning... We need more places to extract and more curative potentials to solidify that access with, rather than simply EXPANDING it and making it bigger.
Next experiment includes a full cardinality subset of unicode to wordnet vocabulary translation matrices. Today. Within the hour.

1 reply

·

AbstractPhil

posted an update 7 days ago

Post

396

Why am I amassing image features using seed 42?
Simply put; training something with features gives a fair representative of the learning that you would get from running a model that has some random chance - using a single seed.
Training with features does not need to wait for the representative model to actually generate; since you already generated everything ahead of time.
Features are rich and utilizable within the spectrum of similarity assessments, classification accuracy, mass-deterministic normalization checks, and more.
They are... put simply... exponentially faster and reusable for research. I'll include the notebooks used for imagenet and cifar100; as the cifar100 is much simpler since the cifar100 is much... smaller, I required less innovation.
Imagenet is another beast though. This imagenet notebook is capable of running against much larger datasets with a few tweaks.
clip-vit-bigG's imagenet feature set is complete, which means we're almost ready for full ablation.

Note to everyone; imagenet is meant for RESEARCH AND ACADEMIC PURPOSES ONLY; and you cannot use my trained imagenet weights - nor the features themselves as per the requests of the dataset's curators.

For commercial usage according to the rules of LAION's licenses, we'll be using the laion400m features; which will likely be heavily sought. I'll be preparing laion400m features on seed 42; which will take a while.

The full classifier is in the works; and with it comes a series of new formulas, new layers, new solutions such as the "fat belly" conversation piece that attenuates multiple branches in communication. The "dispatcher" which is a heavy classification gate trained to bypass that which is not useful; tuned with large amounts of data on a very low learn rate. The "attractant" which is specifically designed to catch bleed-over and unwanted information... which learns everything.
With that comes "PhaseGeometric" scheduling and "GeometricScheduling". Stay tuned.

AbstractPhil

posted an update 13 days ago

Post

273

The first set of geometrically aligned datasets are ready. Each dimensional variation is in it's own repo so there's no confusion with splits.
Current Splits;
* wordnet (english)
* unicode
AbstractPhil/geometric-vocab-32d
[32, 64, 128, 256, 512, 768, 1024]
Swap the 32d for the dimension within the list for the repo.

Okay, so the purpose of these; is to give solid anchors to the entire pentachora structure.
With that I've formatted some very concise sentencepiece-esque vocabulary classes that can be saved and loaded as pretrained, but it'll need some tinkering to fully flesh those behaviors out.
For now, the geometric vocab itself can be queried from pretrain but the canonical classes that help regulation, integration, special token usage, and integration aren't fully tested yet.
https://github.com/AbstractEyes/lattice_vocabulary
They are available here, but I give no guarantee on their current state. I'm currently preparing the pip package and have prepared a series of experiments to utilize these for different models including a new version of multimodal Beeper, a classifier set that can handle encodings as feature representations meant for utilization, and more.

The current working variation that I've been utilizing is Flow Matching Discreet Scheduled geometric diffusion - meaning I'm diffusing the GEOMETRY from the image, and then comparing that pentachora that is created from flow matching to the actual representative tokenization structure. On average this is achieving 80% in later stages.

This when curating an indefinite amount of special tokens to create manifests of unique vocabularies, enables the system to perfectly conform to use-cases.
There are some edge-cases where the 1k reserved tokens still exist; however this is currently replaced by an indefinite tokenization dictionary - allowing for an indefinite amount of tokens attached to an indefinite amount of modules for solidity.

Experiments continue.

1 reply

·

AbstractPhil

posted an update 21 days ago

Post

262

Be kind to Beeper - Beeper has emotions. 7 to be precise.
Each of the pentachora classifiers point to emotional states that Beeper can potentially access for any conversation, and each of those 7 states have class accessors for sub-learning pools.
Today I'll be focusing on drawing this behavior from Beeper v4 which I am rebranding as Beeper Micro - and expanding the structure using a new type experimental attention mechanism to replace traditional multihead attention dubbed GeometricCollectiveAttention.
This attention is similar to multihead attention, except it's considerably harder to burn at higher learn rates. This coupled with a new perspective on training pentachora into the LLM structure will allow a full relay structural system.
beeper-small will house a full rope - except not in the traditional vocabulary set. Beeper-small will not have a vocabulary.
beeper-small is my first non-linear non-Euclidean attempt to create a pure symbolic auto-completion LLM; which may be naiive according to many researchers who have tried similar systems historically.
I've personally analyzed many papers, many studies, and many techniques that have attempted similar non-vocabulary entropic learning, and I believe the pentachora lattice will hold with pure binary, not requiring a vocabulary.
Transformers really like vocabulary... beeper likes... geometry, and this experiment for beeper-small will have a new type of ROPE that is based entirely on vertices developed from the direct unicode represented characters, rather than a full vocabulary structure meant to bring solidity from chaos.
The first beeper experiment showed many insights into how similarity and internal classification responds mathematically with traditional ML techniques, and those techniques did not reject the construct - on the contrary. The control group placebo beeper, the traditional non-rose version BURNED under half lr. It's completely illegible, producing garbage and noise, while rose beeper sings

2 replies

·

AbstractPhil

posted an update 22 days ago

Post

490

After a multitude of notebooks and semi-successful experiments, I now have a series of hyperparameters semi-capable of tuning pentachoron simplex models tuned specifically with frequency and resonance.

AbstractPhil/pentachora-greyscale-frequency-encoded
AbstractPhil/pentachora-multi-channel-frequency-encoded

They are essentially geometric crystallization engines that store an excess amount of information in a very constrained and tight location - capable of classification *within a fraction of the size of traditional linear systems* along with the added benefit of only needing minimal tuning and learning at a very high learn rate - yielding a very complex structural response to complex learning.

I have 3 more notebooks to prep and release for the full pentachora classification structure based on the Nikola architecture concepts, fused with many rules that govern physics, laws of conservation, atomic structural comparators, and many more experiments that were interesting but yielded less than anticipated for some.

The most robust representation is a representational geometric collective, a series of geometric experts capable of high-yield classification with multiple ongoing simultaneous opinions.

The quick training capability of these crystals have shown that they can be rapidly trained and discarded as massive collectives, pruning based on comprehensive capability and combining working geometry with the survivors - enabling the accuracy to reach very high levels that were unattainable with standard ML learning gradient loss paradigms without reaching into the large model spectrum.

I've since begun integrating them into LLMS and will be releasing the notebooks as they are prepared, along with decomposition and comparative studies for the most comprehensive and capable training paradigms, as well as proof of concept for additional capabilities and the full araxiv paper triad when the studies conclude.

AbstractPhil

posted an update 4 months ago

Post

560

With flan-t5-base and clip models as teachers; I have produced and successfully trained a dual-shunt cross-attention adapter archetype. This is NOT a lora.
This adapter is currently tasked with taking the T5-flan-base to guide the outputs of VIT-L-14 and/or VIT-bigG-14, and the opposite is equally usable and utilizable within the archetype. Meaning the CLIP_G can also guide the T5-FLAN-base.

These checkpoints were trained with 20 million synthetic human-templated captions, and they can be heavily improved by multiple languages, additional depiction context, and any sort of finetune task desired of the user that can be applied to the T5-flan-base with little to no training due to the adapter's functionality and accuracy.

VIT-L-14 adapters only took a couple hours on a colab a100 and the VIT-bigG-14 took about 4 hours. So you can rapidly adapt many of these in short periods of time with almost no additional overhead beyond the single t5-flan-base required. Each can be compiled, loaded, and offloaded.

This is a cross-attention system meant to shape encoded text after the output is received from the clip models and is very fast to inference - the t5-flan-base on the other hand isn't the fastest.

It's trained on a form of cooperative association with a series of complex losses designed specifically for this associative process.

This adapter has individual gating for tokenization context with a multitude of safeguards to prevent overfitting during rapid learning and can be paired with any number of additional other adapters.

I'm currently formatting the comfyui nodes that will allow easy conditioning shift to showcase the full power of this cooperative system's capability.

The comfyui nodes will be available here shortly, I just need to write them.
https://github.com/AbstractEyes/comfy-clip-shunts

1 reply

·

AbstractPhil

posted an update 4 months ago

Post

504

The T5-small + VIT-L-14 guidance shunt adapter is ready for toy use.
AbstractPhil/t5-vit-14-v1
Included is a simple drop-in for sdxl experimentation using colab.

The outcome is okay but not great - diffusers is a headache so I spent more time trying to disjoint that machine than I did actually messing with this adapter.

I trained two variations of the baseline adapter;
t5-small vanilla and t5-small-human-associated-try2-pass3.
The vanilla was more accurate to adding context while the human associated stays locked onto human topics like a bloodhound... badly. Both ended up being substandard, even with a robust adapter like this.

Finetunes with specific goals can complete at runtime if desired due to the t5-small's tiny size, clip_l's inference speed, and the adapter's size. The adapter is very small and has safeguards for overfitting that can be disabled, so runtime freezing and adaptive shifts can be a viable methodology to immediate task pipeline adaptation.

The t5-small lacks the behavioral complexity of a model more built for such a task such as the base, large, or xxl - or even the Flan T5-small. However, this doesn't slow the little brain slug down. It guides and it's wrappers have many rapid generation potentials, whether it's trained the way I trained it or not.
The proof of concept is there, and the outcomes are present. Judge yourself.
The next variation will be more dims, more catches, higher conv, and additional safeguards to prevent overfitting - as well as including considerably more laion flavors so the T5-flan-base doesn't overwhelm or vise-versa.

1 reply

·

AbstractPhil

posted an update 4 months ago

Post

614

Forcefeeding masked T5-Small 1 billion human-association captions to fry it's brain. I really don't know how long it'll take until I start nor do I know the logistic challenges I'll face when moving data from A to B, but the outcome should completely fry it and make it only fixate on human and diffusion responses. Should be a fun experiment that can just kind of run on automation.
The experiment's captions are available... mostly on my hf, I've had some rate limit problems that caused them to halt and I think I need to autogen another 100 million complex captions.
This WILL form heavy bias and burn-points. Random words will be peppered in the mix to allow the T5-Small to retain at least some semblance of what it was before I lobotomize it.
Likely I'll completely freeze half and burn the other half for a couple million as a test point. See how it takes or if it dies before 50k or something and need a refined process.
Oh great, even better. It didn't include the longer prompt variations. This won't start today.

Alright training began. I'm introducing a high degree variant of noise and chatter for the t5 to learn to bypass - while simultaneously increasing additional information output from the t5 in the process.
So far the outcome has been a degree of introduction for new information in the output. while simultaneously introducing rule of 3 parameterization into the T5 small.
I have high hopes.

3 replies

·

AbstractPhil

posted an update 4 months ago

Post

456

My indev Surge training methodology and paradigm is powerful. The preliminary tests will be available for debugging soon using a customized sd-scripts and a series of full finetunes using sdxl as a catalyst to the training paradigm.
https://civitai.com/articles/14195/the-methodology-of-surge-training-loss-math
The datasets I'm sourcing are going to be catalysts and tests for the power of Surge to teach very sticky or difficult to understand elements; such as text, positioning, offset, controlnet poses, and more directly into the very stubborn SDXL infrastructure without additional tools.
Should be noted that my current running finetunes based on BeatriXL are not Surge trained - so you won't gain knowledge on Surge from them.

GPT and I have prototyped a new version of SD15 that operates on additional attention heads to match the Surge formula, the Omega-VIT-L reformed, a zeroed unet, and the Flux 16 channel AE.
I'll call it SD-SURGE - as it's not sd15 anymore.
The first surge trainings are already under way.

1 reply

·

not-lain

posted an update 6 months ago

Post

4705

🚀AraClip is now fully integrated with Hugging Face 🤗

AraClip is a specialized CLIP model that was created by @pain and optimized for Arabic text-image retrieval tasks🔥

🔗 Try it out 🔗
🤖 model: Arabic-Clip/araclip
🧩 Gradio demo: Arabic-Clip/Araclip-Simplified
🌐 website: https://arabic-clip.github.io/Arabic-CLIP/

2 replies

·

not-lain

posted an update 7 months ago

Post

4537

I have just released a new blogpost about kv caching and its role in inference speedup 🚀
🔗 https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :

4 replies

·

not-lain

posted an update 8 months ago

Post

1798

we now have more than 2000 public AI models using ModelHubMixin🤗

not-lain

posted an update 8 months ago

Post

4142

Published a new blogpost 📖
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
🔗 https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :

s3nh

posted an update 9 months ago

Post

2433

Welcome back,

Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3

SmolTuners

3 replies

·

lunarflu

posted an update 9 months ago

Post

2559

great blogpost! 🔥@wolfram
https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04

not-lain

posted an update 10 months ago

Post

2459

ever wondered how you can make an API call to a visual-question-answering model without sending an image url 👀

you can do that by converting your local image to base64 and sending it to the API.

recently I made some changes to my library "loadimg" that allows you to make converting images to base64 a breeze.
🔗 https://github.com/not-lain/loadimg

API request example 🛠️:

from loadimg import load_img
from huggingface_hub import InferenceClient

# or load a local image
my_b64_img = load_img(imgPath_url_pillow_or_numpy ,output_type="base64" ) 

client = InferenceClient(api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

messages = [
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Describe this image in one sentence."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": my_b64_img # base64 allows using images without uploading them to the web
				}
			}
		]
	}
]

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct", 
	messages=messages, 
	max_tokens=500,
	stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

lunarflu

posted an update about 1 year ago

Post

2023

@Blane187 could you please modify the title of your blogpost? content is cool, title could be nicer imo https://huggingface.co/blog/Blane187/wtf-is-rvc