Alxy Savin
xsa-dev
AI & ML interests
npl, news, sentiments, rl, io
Recent Activity
liked
a model
8 days ago
microsoft/phi-4
reacted
to
macadeliccc's
post
with 👍
5 months ago
Automated web scraping with playwright is becoming easier by the day. Now, using ollama tool calling, its possible to perform very high accuracy web scraping (in some cases 100% accurate) through just asking an LLM to scrape the content for you.
This can be completed in a multistep process similar to cohere's platform. If you have tried the cohere playground with web scraping, this will feel very similar. In my experience, the Llama 3.1 version is much better due to the larger context window. Both tools are great, but the difference is the ollama + playwright version is completely controlled by you.
All you need to do is wrap your scraper in a function:
```
async def query_web_scraper(url: str) -> dict:
scraper = WebScraper(headless=False)
return await scraper.query_page_content(url)
```
and then make your request:
```
# First API call: Send the query and function description to the model
response = ollama.chat(
model=model,
messages=messages,
tools=[
{
'type': 'function',
'function': {
'name': 'query_web_scraper',
'description': 'Scrapes the content of a web page and returns the structured JSON object with titles, articles, and associated links.',
'parameters': {
'type': 'object',
'properties': {
'url': {
'type': 'string',
'description': 'The URL of the web page to scrape.',
},
},
'required': ['url'],
},
},
},
]
)
```
To learn more:
Github w/ Playground: https://github.com/tdolan21/tool-calling-playground/blob/main/notebooks/ollama-playwright-web-scraping.ipynb
Complete Guide: https://medium.com/@tdolan21/building-an-llm-powered-web-scraper-with-ollama-and-playwright-6274d5d938b5
liked
a Space
6 months ago
MERaLiON/AudioBench-Leaderboard
Organizations
spaces
7
models
8
xsa-dev/hugs_llama3_technique_ft_16bit_GGUF_1
Updated
xsa-dev/hugs_llama3_technique_ft_8bit_Q8_0
Updated
•
10
xsa-dev/hugs_llama3_technique_ft_lora
Updated
xsa-dev/hugs_llama3_technique_ft_16bit_lora
Updated
xsa-dev/hugs_llama3_technique_ft_16bit
Updated
xsa-dev/ppo-Huggy
Reinforcement Learning
•
Updated
•
8
xsa-dev/ppo-LunarLander-v2
Reinforcement Learning
•
Updated
xsa-dev/llama-2-7b-miniguanaco
Text Generation
•
Updated
•
13
datasets
None public yet