it could probably be classified somewhere in 3
ℏεsam
hesamation
AI & ML interests
post-training / reasonign models / RAG
Recent Activity
updated
a model
2 days ago
hesamation/Qwen3-4B-Base-FOL-GRPO-LoRA
published
a model
3 days ago
hesamation/Qwen3-4B-Base-FOL-GRPO-LoRA
updated
a model
4 days ago
hesamation/Qwen3-8B-Base-FOL
Organizations

replied to
their
post
27 days ago
Post
3254
longer context doesn't generate better responses. it can even hurt your llm/agent. 1M context window doesn't automatically make models smarter as it's not about the size; it's how you use it.
here are 4 types of context failure and why each one happens:
1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.
2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".
3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.
4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.
check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com
here are 4 types of context failure and why each one happens:
1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.
2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".
3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.
4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.
check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com

posted
an
update
28 days ago
Post
3254
longer context doesn't generate better responses. it can even hurt your llm/agent. 1M context window doesn't automatically make models smarter as it's not about the size; it's how you use it.
here are 4 types of context failure and why each one happens:
1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.
2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".
3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.
4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.
check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com
here are 4 types of context failure and why each one happens:
1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.
2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".
3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.
4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.
check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com
Post
5023
in case you didn’t know, Claude now has a developer training course with certificates,
this is better than anything you can find on Coursera.
covers Claude Code, MCP and its advanced topics and even more:
https://www.anthropic.com/learn/build-with-claude
this is better than anything you can find on Coursera.
covers Claude Code, MCP and its advanced topics and even more:
https://www.anthropic.com/learn/build-with-claude

posted
an
update
about 1 month ago
Post
5023
in case you didn’t know, Claude now has a developer training course with certificates,
this is better than anything you can find on Coursera.
covers Claude Code, MCP and its advanced topics and even more:
https://www.anthropic.com/learn/build-with-claude
this is better than anything you can find on Coursera.
covers Claude Code, MCP and its advanced topics and even more:
https://www.anthropic.com/learn/build-with-claude
Post
2884
this repo is gold! a collection of LLM apps with multi-agents, MCP, RAG and so much more.
the best way to learn is by building, and this repo provides the blueprint.
Repo: https://github.com/Shubhamsaboo/awesome-llm-apps
the best way to learn is by building, and this repo provides the blueprint.
Repo: https://github.com/Shubhamsaboo/awesome-llm-apps

posted
an
update
2 months ago
Post
2884
this repo is gold! a collection of LLM apps with multi-agents, MCP, RAG and so much more.
the best way to learn is by building, and this repo provides the blueprint.
Repo: https://github.com/Shubhamsaboo/awesome-llm-apps
the best way to learn is by building, and this repo provides the blueprint.
Repo: https://github.com/Shubhamsaboo/awesome-llm-apps
Post
2755
I really like how this seven-stage pipeline was laid out in the Ultimate Guide to Fine-Tuning book.
It gives an overview, then goes into detail for each stage, even providing best practices.
It’s 115 pages on arxiv, definitely worth a read.
Check it out: https://arxiv.org/abs/2408.13296
It gives an overview, then goes into detail for each stage, even providing best practices.
It’s 115 pages on arxiv, definitely worth a read.
Check it out: https://arxiv.org/abs/2408.13296

posted
an
update
3 months ago
Post
2755
I really like how this seven-stage pipeline was laid out in the Ultimate Guide to Fine-Tuning book.
It gives an overview, then goes into detail for each stage, even providing best practices.
It’s 115 pages on arxiv, definitely worth a read.
Check it out: https://arxiv.org/abs/2408.13296
It gives an overview, then goes into detail for each stage, even providing best practices.
It’s 115 pages on arxiv, definitely worth a read.
Check it out: https://arxiv.org/abs/2408.13296
Post
3447
60+ Generative AI projects for your resume. grind this GitHub repo if you want to level up:
> LLM fine-tuning and applications
> advanced RAG apps
> Agentic AI projects
> MCP and A2A (new)
GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/resources/60_ai_projects.md
> LLM fine-tuning and applications
> advanced RAG apps
> Agentic AI projects
> MCP and A2A (new)
GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/resources/60_ai_projects.md

posted
an
update
3 months ago
Post
3447
60+ Generative AI projects for your resume. grind this GitHub repo if you want to level up:
> LLM fine-tuning and applications
> advanced RAG apps
> Agentic AI projects
> MCP and A2A (new)
GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/resources/60_ai_projects.md
> LLM fine-tuning and applications
> advanced RAG apps
> Agentic AI projects
> MCP and A2A (new)
GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/resources/60_ai_projects.md
Post
3103
this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics:
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.
Book: https://fleuret.org/public/lbdl.pdf
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.
Book: https://fleuret.org/public/lbdl.pdf

posted
an
update
3 months ago
Post
3103
this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics:
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.
Book: https://fleuret.org/public/lbdl.pdf
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.
Book: https://fleuret.org/public/lbdl.pdf
Post
2992
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,
Here's some of their key findings:
1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.
This is verified in the DeepSeek-R1 paper.
2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.
3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.
This shows the RL reasoning is generalized beyond the specific domain knowledge.
Previous research also shows RL can be a great generalizer.
4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.
So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)
5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.
RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.
This might explain the "aha" moments!
6/ OpenAI's competitive programming paper showed an interesting finding:
o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)
RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.
He also lists more influential papers on this topic, It's a must-read if you're interested.
check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
Here's some of their key findings:
1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.
This is verified in the DeepSeek-R1 paper.
2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.
3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.
This shows the RL reasoning is generalized beyond the specific domain knowledge.
Previous research also shows RL can be a great generalizer.
4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.
So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)
5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.
RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.
This might explain the "aha" moments!
6/ OpenAI's competitive programming paper showed an interesting finding:
o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)
RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.
He also lists more influential papers on this topic, It's a must-read if you're interested.
check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training

posted
an
update
4 months ago
Post
2992
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,
Here's some of their key findings:
1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.
This is verified in the DeepSeek-R1 paper.
2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.
3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.
This shows the RL reasoning is generalized beyond the specific domain knowledge.
Previous research also shows RL can be a great generalizer.
4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.
So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)
5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.
RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.
This might explain the "aha" moments!
6/ OpenAI's competitive programming paper showed an interesting finding:
o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)
RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.
He also lists more influential papers on this topic, It's a must-read if you're interested.
check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training
Here's some of their key findings:
1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.
This is verified in the DeepSeek-R1 paper.
2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.
3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.
This shows the RL reasoning is generalized beyond the specific domain knowledge.
Previous research also shows RL can be a great generalizer.
4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.
So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)
5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.
RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.
This might explain the "aha" moments!
6/ OpenAI's competitive programming paper showed an interesting finding:
o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)
RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.
He also lists more influential papers on this topic, It's a must-read if you're interested.
check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training

replied to
their
post
4 months ago
Linkedin, I included the link.
Post
2247
OpenAI just released a 34-page practical guide to building agents,
Here's 10 things it teaches us:
1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.
2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data
3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave
4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed
5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.
6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.
7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.
8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.
9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.
10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.
Download: https://t.co/fJaCkgf7ph
Here's 10 things it teaches us:
1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.
2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data
3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave
4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed
5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.
6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.
7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.
8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.
9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.
10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.
Download: https://t.co/fJaCkgf7ph

posted
an
update
4 months ago
Post
2247
OpenAI just released a 34-page practical guide to building agents,
Here's 10 things it teaches us:
1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.
2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data
3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave
4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed
5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.
6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.
7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.
8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.
9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.
10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.
Download: https://t.co/fJaCkgf7ph
Here's 10 things it teaches us:
1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.
2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data
3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave
4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed
5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.
6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.
7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.
8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.
9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.
10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.
Download: https://t.co/fJaCkgf7ph
Post
2934
this paper has been blowing up
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling

posted
an
update
4 months ago
Post
2934
this paper has been blowing up
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling