AI & ML interests

AGI and ML Pipelines, Ambient IoT AI, Behavior Cognitive and Memory AI, Clinical Medical and Nursing AI, Genomics AI, GAN Gaming GAIL AR VR XR and Simulation AI, Graph Ontology KR KE AI, Languages and NLP AI, Quantum Compute GPU TPU NPU AI, Vision Image Document AI

AIZeroToHero's activity

hesamationΒ 
posted an update about 8 hours ago
hesamationΒ 
posted an update 6 days ago
view post
Post
7106
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
hesamationΒ 
posted an update 10 days ago
view post
Post
2809
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)
hesamationΒ 
posted an update 14 days ago
view post
Post
2681
What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them:
> parallel, sequential, hybrid, internal scaling
> how to scale (SFT, RL, search, verification)
> metrics and evals of test-time scaling

πŸ”—paper: What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models (2503.24235)

If you want to learn what inference-time compute scaling is @rasbt has a great blog post on that:
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling
hesamationΒ 
posted an update 15 days ago
awacke1Β 
posted an update 16 days ago
view post
Post
1456
AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? β˜•
2. Camera Snap πŸ“·: Capture life’s chaosβ€”your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR πŸ”: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt β€œneon superhero me”
5. PDF πŸ“„: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. πŸ’ͺCharactersπŸ§‘β€πŸŽ¨: Craft quirky heroes.
πŸŽ₯

not-lainΒ 
posted an update about 1 month ago
awacke1Β 
posted an update about 1 month ago
view post
Post
2270
I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" πŸš€

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: awacke1/TorchTransformers-NLP-CV-SFT
  • 2 replies
Β·
awacke1Β 
posted an update about 2 months ago
view post
Post
2479
πŸš€ Blast into the future with ZaxxonGalaxian – a thrilling 3D action game where you navigate epic battles through towering 3D cityscapes! Face off against relentless swarm bots, climb the leaderboard, and dominate the skies. awacke1/ZaxxoGalaxian
not-lainΒ 
posted an update 3 months ago
not-lainΒ 
posted an update 3 months ago
view post
Post
1693
we now have more than 2000 public AI models using ModelHubMixinπŸ€—
not-lainΒ 
posted an update 3 months ago
view post
Post
4056
Published a new blogpost πŸ“–
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
πŸ”— https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
awacke1Β 
posted an update 3 months ago
view post
Post
3199
Deep Research Evaluator was asked:
" design a coral defense mechanism that upon sensing say an acid that's causing coral reefs to have a carbon dioxide issue it develops... please create a plan and a design for this\n
"
It picks these three as best combined solution.

1. [Reef-insight: A framework for reef habitat mapping with clustering methods via remote sensing]...
2. Phone a friend: [Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics]...
3. World Solve: [Dependence of Physiochemical Features on Marine Chlorophyll Analysis with Learning Techniques]


To design a system that allows coralows coral reefs to respond to increased acidity levels in their environment, we can create a network of pH sensors and dispersal units that can detect changes in pH levels and release a base solution to neutralize the acid.

1. pH Sensors: The first component of the system would be a network of pH sensors placed strategically throughout the coral reef. These sensors would be small, durable, and able to withstand the harsh conditions of the ocean. They would be placed at various depths and locations within the reef to ensure accurate and comprehensive monitoring of pH levels.
2. Base Dispersal Units: Once the pH sensors detect a decrease in pH levels, they would trigger the base dispersal units to release a base solution into the water. These units would be strategically placed around the reef and would be able to release a controlled amount of base solution to neutralize the acidity in the water.
3. Water Dispersal Mechanism: The base dispersal units would be connected to a water dispersal mechanism that would allow the base solution to be distributed evenly around the reef. This could be achieved through a series of pipes or channels that would distribute the base solution in a controlled and targeted manner.
Sri-Vigneshwar-DJΒ 
posted an update 3 months ago
view post
Post
737
Checkout phi-4 from Microsoft, dropped a day ago... If you ❀️ the Phi series, then here is the GGUF - Sri-Vigneshwar-DJ/phi-4-GGUF. phi-4 is a 14B highly efficient open LLM that beats much larger models at math and reasoning - check out evaluations on the Open LLM.

Technical paper - https://arxiv.org/pdf/2412.08905 ; The Data Synthesis approach is interesting
Sri-Vigneshwar-DJΒ 
posted an update 3 months ago
view post
Post
2087
Just sharing a thought: I started using DeepSeek V3 a lot, and an idea struck me about agents "orchestrating during inference" on a test-time compute model like DeepSeek V3 or the O1 series.

Agents (Instruction + Function Calls + Memory) execute during inference, and based on the output decision, a decision is made to scale the time to reason or perform other tasks.
Sri-Vigneshwar-DJΒ 
posted an update 3 months ago
view post
Post
2351
Combining smolagents with Anthropic’s best practices simplifies building powerful AI agents:

1. Code-Based Agents: Write actions as Python code, reducing steps by 30%.
2. Prompt Chaining: Break tasks into sequential subtasks with validation gates.
3. Routing: Classify inputs and direct them to specialized handlers.
4. Fallback: Handle tasks even if classification fails.

https://huggingface.co/blog/Sri-Vigneshwar-DJ/building-effective-agents-with-anthropics-best-pra
awacke1Β 
posted an update 3 months ago
awacke1Β 
posted an update 5 months ago
view post
Post
1017
πŸ•ŠοΈHopeπŸ•ŠοΈ and βš–οΈJusticeβš–οΈ AI
🚲 Stolen bike in Denver FOUND - Sometimes hope & justice DO prevail.

🎬 So I Created an AI+Art+Music tribute:
-🧠 AI App that Evaluates GPT-4o vs Claude:
awacke1/RescuerOfStolenBikes
https://x.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1857640877986033980%7Ctwgr%5E203a5022b0eb4c41ee8c1dd9f158330216ac5be1%7Ctwcon%5Es1_c10&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Furl%3Dhttps%3A%2F%2Ftwitter.com%2FAaron_Wacker%2Fstatus%2F1857640877986033980

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">QT your πŸ•ŠοΈHopeπŸ•ŠοΈ and βš–οΈJusticeβš–οΈ art🎨<br><br>🚲 Stolen bike in Denver FOUND! <br> - Sometimes hope &amp; justice DO prevail! <br><br>🎬 Created an AI+Art+Music tribute: <br> -🧠 AI App that Evaluates GPT-4o vs Claude: <a href="https://t.co/odrYdaeizZ">https://t.co/odrYdaeizZ</a><br> <a href="https://twitter.com/hashtag/GPT?src=hash&amp;ref_src=twsrc%5Etfw">#GPT</a> <a href="https://twitter.com/hashtag/Claude?src=hash&amp;ref_src=twsrc%5Etfw">#Claude</a> <a href="https://twitter.com/hashtag/Huggingface?src=hash&amp;ref_src=twsrc%5Etfw">#Huggingface</a> <a href="https://twitter.com/OpenAI?ref_src=twsrc%5Etfw">@OpenAI</a> <a href="https://twitter.com/AnthropicAI?ref_src=twsrc%5Etfw">@AnthropicAI</a> <a href="https://t.co/Q9wGNzLm5C">pic.twitter.com/Q9wGNzLm5C</a></p>&mdash; Aaron Wacker (@Aaron_Wacker) <a href="https://twitter.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw">November 16, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


#GPT #Claude #Huggingface
@OpenAI
@AnthropicAI
not-lainΒ 
posted an update 5 months ago
view post
Post
2364
ever wondered how you can make an API call to a visual-question-answering model without sending an image url πŸ‘€

you can do that by converting your local image to base64 and sending it to the API.

recently I made some changes to my library "loadimg" that allows you to make converting images to base64 a breeze.
πŸ”— https://github.com/not-lain/loadimg

API request example πŸ› οΈ:
from loadimg import load_img
from huggingface_hub import InferenceClient

# or load a local image
my_b64_img = load_img(imgPath_url_pillow_or_numpy ,output_type="base64" ) 

client = InferenceClient(api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

messages = [
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Describe this image in one sentence."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": my_b64_img # base64 allows using images without uploading them to the web
				}
			}
		]
	}
]

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct", 
	messages=messages, 
	max_tokens=500,
	stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
awacke1Β 
posted an update 6 months ago
view post
Post
1968
Since 2022 I have been trying to understand how to support advancement of the two best python patterns for AI development which are:
1. Streamlit
2. Gradio

The reason I chose them in this order was the fact that the streamlit library had the timing drop on gradio by being available with near perfection about a year or two before training data tap of GPT.

Nowadays its important that if you want current code to be right on generation it requires understanding of consistency in code method names so no manual intervention is required with each try.

With GPT and Claude being my top two for best AI pair programming models, I gravitate towards streamlit since aside from common repeat errors on cache and experimental functions circa 2022 were not solidified.
Its consistency therefore lacks human correction needs. Old dataset error situations are minimal.

Now, I seek to make it consistent on gradio side. Why? Gradio lapped streamlit for blocks paradigm and API for free which are I feel are amazing features which change software engineering forever.

For a few months I thought BigCode would become the new best model due to its training corpus datasets, yet I never felt it got to market as the next best AI coder model.

I am curious on Gradio's future and how. If the two main models (GPT and Claude) pick up the last few years, I could then code with AI without manual intervention. As it stands today Gradio is better if you could get the best coding models to not repeatedly confuse old syntax as current syntax yet we do live in an imperfect world!

Is anyone using an AI pair programming model that rocks with Gradio's latest syntax? I would like to code with a model that knows how to not miss the advancements and syntax changes that gradio has had in the past few years. Trying grok2 as well.

My IDE coding love is HF. Its hands down faster (100x) than other cloud paradigms. Any tips on models best for gradio coding I can use?

--Aaron
Β·