LLMs
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
LLMs's activity
prithivMLmods
posted
an
update
about 14 hours ago
Post
1204
Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI!
It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.
Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.
Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.
The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.
It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.
Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.
Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.
The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.
KnutJaegersberg
posted
an
update
2 days ago
Post
1696
Evolution and The Knightian Blindspot of Machine Learning
The paper discusses machine learning's limitations in addressing Knightian Uncertainty (KU), highlighting the fragility of models like reinforcement learning (RL) in unpredictable, open-world environments. KU refers to uncertainty that can't be quantified or predicted, a challenge that RL fails to handle due to its reliance on fixed data distributions and limited formalisms.
### Key Approaches:
1. **Artificial Life (ALife):** Simulating diverse, evolving systems to generate adaptability, mimicking biological evolution's robustness to unpredictable environments.
2. **Open-Endedness:** Creating AI systems capable of continuous innovation and adaptation, drawing inspiration from human creativity and scientific discovery.
3. **Revising RL Formalisms:** Modifying reinforcement learning (RL) models to handle dynamic, open-world environments by integrating more flexible assumptions and evolutionary strategies.
These approaches aim to address ML’s limitations in real-world uncertainty and move toward more adaptive, general intelligence.
https://arxiv.org/abs/2501.13075
The paper discusses machine learning's limitations in addressing Knightian Uncertainty (KU), highlighting the fragility of models like reinforcement learning (RL) in unpredictable, open-world environments. KU refers to uncertainty that can't be quantified or predicted, a challenge that RL fails to handle due to its reliance on fixed data distributions and limited formalisms.
### Key Approaches:
1. **Artificial Life (ALife):** Simulating diverse, evolving systems to generate adaptability, mimicking biological evolution's robustness to unpredictable environments.
2. **Open-Endedness:** Creating AI systems capable of continuous innovation and adaptation, drawing inspiration from human creativity and scientific discovery.
3. **Revising RL Formalisms:** Modifying reinforcement learning (RL) models to handle dynamic, open-world environments by integrating more flexible assumptions and evolutionary strategies.
These approaches aim to address ML’s limitations in real-world uncertainty and move toward more adaptive, general intelligence.
https://arxiv.org/abs/2501.13075
Delta-Vector
posted
an
update
3 days ago
Post
533
For anyone that enjoys Magnum models, I just dropped a 12B that is the first (or second?) stepping stone into Magnum V5
Delta-Vector/rei-12b-6795505005c4a94ebdfdeb39
Delta-Vector/rei-12b-6795505005c4a94ebdfdeb39
KnutJaegersberg
posted
an
update
5 days ago
Post
1993
Artificial Kuramoto Oscillatory Neurons
Artificial Kuramoto Oscillatory Neurons (AKOrN) differ from traditional artificial neurons by oscillating, rather than just turning on or off. Each neuron is represented by a rotating vector on a sphere, influenced by its connections to other neurons. This behavior is based on the Kuramoto model, which describes how oscillators (like neurons) tend to synchronize, similar to pendulums swinging in unison.
Key points:
Oscillating Neurons: Each AKOrN’s rotation is influenced by its connections, and they try to synchronize or oppose each other.
Synchronization: When neurons synchronize, they "bind," allowing the network to represent complex concepts (e.g., "a blue square toy") by compressing information.
Updating Mechanism: Neurons update their rotations based on connected neurons, input stimuli, and their natural frequency, using a Kuramoto update formula.
Network Structure: AKOrNs can be used in various network layers, with iterative blocks combining Kuramoto layers and feature extraction modules.
Reasoning: This model can perform reasoning tasks, like solving Sudoku puzzles, by adjusting neuron interactions.
Advantages: AKOrNs offer robust feature binding, reasoning capabilities, resistance to adversarial data, and well-calibrated uncertainty estimation.
In summary, AKOrN's oscillatory neurons and synchronization mechanisms enable the network to learn, reason, and handle complex tasks like image classification and object discovery with enhanced robustness and flexibility.
yt
https://www.youtube.com/watch?v=i3fRf6fb9ZM
paper
https://arxiv.org/html/2410.13821v1
Artificial Kuramoto Oscillatory Neurons (AKOrN) differ from traditional artificial neurons by oscillating, rather than just turning on or off. Each neuron is represented by a rotating vector on a sphere, influenced by its connections to other neurons. This behavior is based on the Kuramoto model, which describes how oscillators (like neurons) tend to synchronize, similar to pendulums swinging in unison.
Key points:
Oscillating Neurons: Each AKOrN’s rotation is influenced by its connections, and they try to synchronize or oppose each other.
Synchronization: When neurons synchronize, they "bind," allowing the network to represent complex concepts (e.g., "a blue square toy") by compressing information.
Updating Mechanism: Neurons update their rotations based on connected neurons, input stimuli, and their natural frequency, using a Kuramoto update formula.
Network Structure: AKOrNs can be used in various network layers, with iterative blocks combining Kuramoto layers and feature extraction modules.
Reasoning: This model can perform reasoning tasks, like solving Sudoku puzzles, by adjusting neuron interactions.
Advantages: AKOrNs offer robust feature binding, reasoning capabilities, resistance to adversarial data, and well-calibrated uncertainty estimation.
In summary, AKOrN's oscillatory neurons and synchronization mechanisms enable the network to learn, reason, and handle complex tasks like image classification and object discovery with enhanced robustness and flexibility.
yt
https://www.youtube.com/watch?v=i3fRf6fb9ZM
paper
https://arxiv.org/html/2410.13821v1
KnutJaegersberg
posted
an
update
6 days ago
Post
1538
DeepSeek R1 on how to build conscious AGI
https://huggingface.co/blog/KnutJaegersberg/deepseek-r1-on-conscious-agi
https://huggingface.co/blog/KnutJaegersberg/deepseek-r1-on-conscious-agi
Post
1648
New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the
@akhaliq
)
Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.
Link: https://deep-diver.github.io/ai-paper-reviewer/
This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)
Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.
Link: https://deep-diver.github.io/ai-paper-reviewer/
This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)
Post
1958
Simple summarization of Evolving Deeper LLM Thinking (Google DeepMind)
The process starts by posing a question.
1) The LLM generates initial responses.
2) These generated responses are evaluated according to specific criteria (program-based checker).
3) The LLM critiques the evaluated results.
4) The LLM refines the responses based on the evaluation, critique, and original responses.
The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged).
Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios.
However, there are two major drawbacks:
🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.)
🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.)
https://arxiv.org/abs/2501.09891
The process starts by posing a question.
1) The LLM generates initial responses.
2) These generated responses are evaluated according to specific criteria (program-based checker).
3) The LLM critiques the evaluated results.
4) The LLM refines the responses based on the evaluation, critique, and original responses.
The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged).
Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios.
However, there are two major drawbacks:
🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.)
🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.)
https://arxiv.org/abs/2501.09891
Post
1973
Simple Summarization on DeepSeek-R1 from DeepSeek AI
The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.
Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.
Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1
The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.
Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.
Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1
KnutJaegersberg
posted
an
update
9 days ago
Post
469
Yet another blog post about general intelligence
https://huggingface.co/blog/KnutJaegersberg/general-intelligence
https://huggingface.co/blog/KnutJaegersberg/general-intelligence
prithivMLmods
posted
an
update
9 days ago
Post
3358
Q'n' Sketches ❤️🔥
🖼️ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
🐍 Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
🔗Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
🖼️ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
🐍 Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
🔗Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
KnutJaegersberg
posted
an
update
10 days ago
Post
1752
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
It's an interesting paper that argues "new approaches are required that can reliably solve a wide variety of problems without existing skills."
"It is therefore hoped that the benchmark outlined in this article contributes to further exploration of this direction of research and incentivises the development of new AGI approaches that focus on intelligence rather than skills."
https://arxiv.org/abs/2501.07458
It's an interesting paper that argues "new approaches are required that can reliably solve a wide variety of problems without existing skills."
"It is therefore hoped that the benchmark outlined in this article contributes to further exploration of this direction of research and incentivises the development of new AGI approaches that focus on intelligence rather than skills."
https://arxiv.org/abs/2501.07458
prithivMLmods
posted
an
update
13 days ago
Post
3061
ChemQwen-vL [ Qwen for Chem Vision ] 🧑🏻🔬
🧪Model : prithivMLmods/ChemQwen-vL
📝ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/
📒Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju
Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/
🤗: @prithivMLmods
🧪Model : prithivMLmods/ChemQwen-vL
📝ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/
📒Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju
Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/
🤗: @prithivMLmods
KnutJaegersberg
posted
an
update
16 days ago
prithivMLmods
posted
an
update
20 days ago
Post
3355
200+ f{🤗} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]
❤️🔥Stranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA
👽Try Demo: prithivMLmods/FLUX-LoRA-DLC
📦Most Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA
🤗Thanks for Community & OPEN SOURCEEE !!
❤️🔥Stranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA
👽Try Demo: prithivMLmods/FLUX-LoRA-DLC
📦Most Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA
🤗Thanks for Community & OPEN SOURCEEE !!
prithivMLmods
posted
an
update
24 days ago
Post
5904
Reasoning SmolLM2 🚀
🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.
🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft
🔼 Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF
🤠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M
🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.
🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft
🔼 Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF
🤠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M
prithivMLmods
posted
an
update
29 days ago
Post
3863
Triangulum Catalogued 🔥💫
🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF
+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF
+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF
+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF
+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
prithivMLmods
posted
an
update
about 1 month ago
Post
1833
🔥 BIG ANNOUNCEMENT: THE HELPINGAI API IS LIVE! 🔥
Yo, the moment you’ve all been waiting for is here! 🚀 The HelpingAI API is now LIVE and ready to level up your projects! 🔥 We’re bringing that next-level AI goodness straight to your fingertips. 💯
No more waiting— it’s time to build something epic! 🙌
From now on, you can integrate our cutting-edge AI models into your own applications, workflows, and everything in between. Whether you’re a developer, a creator, or just someone looking to make some serious moves, this is your chance to unlock the full potential of emotional intelligence and adaptive AI.
Check out the docs 🔥 and let’s get to work! 🚀
👉 Check out the docs and start building (https://helpingai.co/docs)
👉 Visit the HelpingAI website (https://helpingai.co/)
Yo, the moment you’ve all been waiting for is here! 🚀 The HelpingAI API is now LIVE and ready to level up your projects! 🔥 We’re bringing that next-level AI goodness straight to your fingertips. 💯
No more waiting— it’s time to build something epic! 🙌
From now on, you can integrate our cutting-edge AI models into your own applications, workflows, and everything in between. Whether you’re a developer, a creator, or just someone looking to make some serious moves, this is your chance to unlock the full potential of emotional intelligence and adaptive AI.
Check out the docs 🔥 and let’s get to work! 🚀
👉 Check out the docs and start building (https://helpingai.co/docs)
👉 Visit the HelpingAI website (https://helpingai.co/)
KnutJaegersberg
posted
an
update
about 1 month ago
Post
1367
Intelligence Potentiation: An Evolutionary Perspective on AI Agent Designs
I found it useful to think of AI agent design as progressing up a ladder, through evolutionary selection.
https://huggingface.co/blog/KnutJaegersberg/intelligence-potentiation
I found it useful to think of AI agent design as progressing up a ladder, through evolutionary selection.
https://huggingface.co/blog/KnutJaegersberg/intelligence-potentiation