SEOcrate Logo    SEOcrate 01      powered by    WordLift Logo

SEOcrate-4B_grpo_new_01: Proof-of-Concept

Enhancing SEO Reasoning in Gemma 3 via Ontology-Guided Reinforcement Learning

SEOcrate spells like "Sòcrate" (the Greek philosopher), blending SEO expertise with deep reasoning.


Model Description:

This repository contains SEOcrate-4B_grpo_new_01, an open-source Proof-of-Concept (PoC) demonstrating a novel methodology for enhancing specialized reasoning in Large Language Models (LLMs). This model is a fine-tuned version of unsloth/gemma-3-4b-it-bnb-4bit (a 4-bit quantized version of google/gemma-3-4b-it).

The core innovation tested here is the use of Reinforcement Learning (specifically GRPO) guided by a domain-specific ontology (SEOntology/seovoc) to teach the model expert-like reasoning pathways for Search Engine Optimization (SEO) tasks.

Project Goal:

The primary goal of this PoC was to test the hypothesis that combining Reinforcement Learning (RL) with structured knowledge from a domain ontology (seovoc, extending schema.org) can significantly improve an LLM's ability to perform complex reasoning tasks within that domain, compared to standard fine-tuning approaches. We aimed to create a model capable of understanding SEO prompts, applying relevant ontological concepts, and generating structured, step-by-step explanations alongside concise answers.

Model Capabilities (Demonstrated in PoC):

  • Structured Output: Generates responses in a specific XML-like format for clarity:
    <reasoning>
    [Step-by-step explanation potentially referencing SEO concepts/ontology terms]
    </reasoning>
    <answer>
    [Concise answer to the prompt]
    </answer>
    
  • SEO Task Focus: Trained on prompts covering various SEO tasks (see Dataset section).
  • Ontology-Guided Reasoning: The fine-tuning process explicitly rewarded the model for generating reasoning steps aligned with concepts and relationships defined in schema.org and the SEOntology (seovoc).

Methodology: Ontology-Guided Reinforcement Learning

This novel methodology, which leverages structured knowledge from a domain-specific ontology to guide Reinforcement Learning, was first presented at the Knowledge Graph Conference (KGC). Unlike standard Supervised Fine-Tuning (SFT) which primarily teaches mimicry, we employed Reinforcement Learning (RL) to explicitly teach the model how to reason effectively within the SEO domain.

  • Base Model: unsloth/gemma-3-4b-it-bnb-4bit (providing foundational language capabilities).
  • Structured Knowledge: The SEOntology (seovoc), an ontology defining key SEO entities, properties, and relationships (https://w3id.org/seovoc/), served as the structured knowledge base.
  • Learning Method: Group Relative Policy Optimization (GRPO) via the trl library, accelerated with Unsloth. GRPO was chosen to optimize the policy (the model's generation strategy) directly based on reward signals.
  • Ontology-Guided Reward Signal: This is the core of the methodology. A custom reward function was designed, utilizing an LLM-as-a-Judge (Gemini 1.5 Pro). This judge evaluated the model's generated <reasoning> and <answer> based on several criteria, crucially including alignment with SEO best practices and the explicit use/implication of relevant concepts from the seovoc ontology. Models were rewarded for outputs demonstrating logical steps consistent with the knowledge structured in the ontology.

Fine-tuning Details

  • Dataset: A custom synthetic dataset (cyberandy/seo-grpo-reasoning-dataset-1000 containing ~960 cleaned examples). This dataset was programmatically generated using Gemini 1.5 Pro, based on detailed task templates that explicitly referenced and incorporated concepts from the SEOntology (seovoc). The generation process created pairs of input data, step-by-step reasoning (<reasoning>...</reasoning>), and a concise answer (<answer>...</answer>) for various SEO tasks (Meta Description Optimization, Internal Link Suggestion, Query Trend Analysis, Schema.org Suggestion, NER, Title Optimization, Intent Classification, Robots.txt Rules, Canonicalization, E-E-A-T Assessment, GMB Optimization, Product Schema Enhancement, Content Revision based on QA). These generated examples were then evaluated by an LLM-as-a-Judge (also Gemini 1.5 Pro), which assigned a reward score (between 0.0 and 1.0) based on the accuracy, relevance, format correctness, and alignment of the reasoning and answer with the seovoc ontology concepts presented as context to the judge. This scored data was then formatted into {'prompt': '...', 'reward': float} pairs for the GRPO training. You can read more about the dataset generation and evaluation methodology in our blog post (linking to the KGC material): An Ontology-Driven Approach to Train Your Next SEO Agent.
  • Training Steps: 500 steps.
  • Key Hyperparameters:
    • Learning Rate: 5e-6 (cosine decay)
    • Batch Size (Effective): 8 (per_device_train_batch_size=8, gradient_accumulation_steps=1)
    • Optimizer: adamw_8bit
    • Sequence Length: 2048
    • Reward Function: Custom Python function evaluating format, keyword usage (incl. seovoc terms influence), length, etc., judged by Gemini 1.5 Pro based on alignment with SEO/ontology principles, scaled with tanh.
  • Hardware: NVIDIA A100 40GB

How to Use

This model expects prompts formatted for chat, including the crucial system prompt defining the task and expected output structure.

# Make sure you have the necessary libraries installed:
# pip install torch transformers accelerate bitsandbytes sentencepiece

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
import torch

# Use the Hub ID of this repository
model_id = "cyberandy/SEOcrate-4B_grpo_new_01"
device = "cuda" if torch.cuda.is_available() else "cpu"
# For 4-bit models, float16 is commonly used.
# If your GPU supports bfloat16 (e.g., Ampere series or newer), you can use that too.
model_dtype = torch.float16

print(f"Loading model and tokenizer for '{model_id}' on device '{device}' with dtype '{model_dtype}'...")

# Load tokenizer
# Using the tokenizer from the model_id is standard for merged/fine-tuned models.
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Configure tokenizer padding (CRITICAL for Gemma models)
if tokenizer.pad_token_id is None:
    print("Tokenizer `pad_token_id` is None. Setting to `eos_token_id`.")
    tokenizer.pad_token_id = tokenizer.eos_token_id
if tokenizer.pad_token is None: # Also set the pad_token string if it's None
    print("Tokenizer `pad_token` is None. Setting to `eos_token`.")
    tokenizer.pad_token = tokenizer.eos_token
# Use left padding for generation
tokenizer.padding_side = 'left'
print(f"Tokenizer configured: pad_token_id={tokenizer.pad_token_id}, padding_side='{tokenizer.padding_side}'")
print(f"Special tokens: BOS='{tokenizer.bos_token}' (ID: {tokenizer.bos_token_id}), EOS='{tokenizer.eos_token}' (ID: {tokenizer.eos_token_id}), PAD='{tokenizer.pad_token}' (ID: {tokenizer.pad_token_id})")


# Load model
# device_map="auto" will place the model on GPU if available, otherwise CPU.
# For better performance on supported hardware, you can try:
# attn_implementation="flash_attention_2"
# However, "eager" is a safe default if FA2 is not available/installed.
try:
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=model_dtype,
        device_map="auto",
        attn_implementation="eager", # Safe default, can be changed to "flash_attention_2" if supported
    )
    model.eval() # Set model to evaluation mode
    print(f"Model loaded successfully to device: {model.device}")
except Exception as e:
    print(f"Error loading model: {e}")
    print("Ensure you have enough VRAM/RAM and required libraries (like bitsandbytes for 4-bit).")
    model = None # Prevent further execution if model load fails

if model:
    # System Prompt: Essential for guiding the model's output format and persona
    system_prompt = """
Act as an expert SEO analyst familiar with the seovoc ontology (https://w3id.org/seovoc/) which extends schema.org.
Based on the provided input, perform the specified SEO task.
Output your analysis and suggestion in the specified XML format:
<reasoning>
Explain your reasoning step-by-step. Use seovoc/schema.org concepts where relevant to justify your steps.
</reasoning>
<answer>
Provide only the final, concise answer in the requested format.
</answer>
"""

    # Example User Prompt
    user_prompt = "Suggest an appropriate schema.org type for a webpage that lists local business hours, address, and phone number."

    # Format messages using the chat template
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_prompt},
    ]

    # Apply chat template and tokenize
    # Ensure add_generation_prompt=True for models like Gemma expecting it
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True, # Important for Gemma instruct models
        tokenize=True,
        return_tensors="pt"
    ).to(model.device) # Ensure inputs are on the same device as the model

    # Generation Configuration
    # For deterministic output (greedy decoding), set do_sample=False.
    # Temperature and top_p are ignored when do_sample=False.
    gen_config = GenerationConfig(
        max_new_tokens=512,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
        # Ensure eos_token_id is correctly set.
        # Gemma's primary EOS is 1 (<eos>). Instruct models might also use 106 (<end_of_turn>).
        # Using tokenizer.eos_token_id if available, otherwise defaulting to 1, is a robust approach.
        eos_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1,
        # Optional: You can also explicitly set bos_token_id if needed, though usually handled by template.
        # bos_token_id=tokenizer.bos_token_id if tokenizer.bos_token_id is not None else 2,
    )
    # If you wanted to enable sampling for more creative/varied output:
    # gen_config = GenerationConfig(
    #     max_new_tokens=512,
    #     do_sample=True,
    #     temperature=0.6, # Adjust for more/less randomness
    #     top_p=0.9,       # Nucleus sampling
    #     pad_token_id=tokenizer.pad_token_id,
    #     eos_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1,
    # )

    # Generate Output using a streamer for progressive output
    text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    print(f"\n--- SEOcrate 01 Output for prompt: '{user_prompt}' ---")
    with torch.no_grad(): # Disable gradient calculations for inference
        _ = model.generate(input_ids=inputs, generation_config=gen_config, streamer=text_streamer)
    print("\n--- End of Output ---")
else:
    print("Skipping generation as model failed to load.")

Try SEOcrate 01 Live! <<< READY FOR TESTING?

Want to test the model yourself without any setup? We've prepared an easy-to-use Google Colab notebook:

Open In Colab

Click the badge above to open the notebook, where you can input your own SEO prompts and see SEOcrate 01 in action.

Example Inferences

Here are a few examples of how SEOcrate responds to different SEO-related prompts, showcasing its reasoning process and structured output.

Prompt 1: Internal Linking Suggestion

User Prompt: I have a new blog post titled "The Ultimate Guide to On-Page SEO". I also have existing articles titled "Keyword Research for Beginners", "Writing SEO-Friendly Content", and "Understanding Technical SEO Audits". Suggest two relevant internal links to include in my new "On-Page SEO" guide from these existing articles.

SEOcrate Output:

<reasoning>
The goal here is to strategically link to existing content to improve user experience, boost engagement, and signal relevance to search engines. We need to choose links that will benefit the "The Ultimate Guide to On-Page SEO" post.

"Keyword Research for Beginners": This article likely contains foundational information about how to identify relevant keywords. Linking to this from the "On-Page SEO" guide makes sense because on-page SEO heavily relies on keyword research to target the right terms. This is a direct and valuable connection.
"Writing SEO-Friendly Content": This article probably covers best practices for incorporating keywords naturally within content, optimizing headings, and using other elements that improve readability and search engine visibility. Linking to this from the "On-Page SEO" guide reinforces that the content itself is crucial for on-page SEO.
I'm prioritizing links that demonstrate why on-page SEO is important and how the existing articles contribute to achieving on-page SEO success. This aligns with the principle of providing valuable, helpful content that addresses user intent.
</reasoning>
<answer>
Here are two relevant internal links to include in "The Ultimate Guide to On-Page SEO":

Link to: "Keyword Research for Beginners" Text to use: "To ensure you're targeting the right keywords for your on-page SEO efforts, start with a solid keyword research strategy. Learn how to find relevant terms with our guide: [Link to "Keyword Research for Beginners"]"
Link to: "Writing SEO-Friendly Content" Text to use: "On-page SEO isn't just about keywords – it's about creating content that's both engaging and optimized for search engines. Discover how to write content that performs with our guide: [Link to "Writing SEO-Friendly Content"]"
</answer>

Prompt 2: E-E-A-T Assessment

User Prompt: A new blog publishes articles about cryptocurrency investments. What are three key actions they can take to improve their E-E-A-T signals for Google?

SEOcrate Output:

<reasoning>
Okay, let's break down how a new cryptocurrency investment blog can improve its E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) signals in Google. E-E-A-T is increasingly important for ranking in Google, especially for topics like cryptocurrency where misinformation and scams are prevalent. Here’s a step-by-step approach:

Focus on Demonstrating Experience (Specifically, Practical Experience with Investing): Since it’s a new blog, the author's personal experience is crucial. Google prioritizes content where the creator has done something related to the topic. This is about showing how they've invested, not just what they think.
Building Expertise Through Thorough Research & Clear Explanations: Cryptocurrency is complex. The blog needs to demonstrate a deep understanding of the technology, markets, and risks involved. This means extensive research, citing reliable sources (not just other crypto blogs!), and explaining concepts clearly, even for beginners.
Establishing Authoritativeness by Becoming a Reliable Source: This is about the blog itself becoming known as a trustworthy source within the crypto investment community. This comes through consistent, high-quality content, positive engagement, and potentially, mentions in other reputable sources.
Highlighting Trustworthiness Through Transparency & Risk Disclosure: Crucially, the blog must be transparent about its potential conflicts of interest (e.g., affiliate links) and clearly disclose the risks involved in cryptocurrency investing. This is paramount for building trust.
Considering this, I'll focus on actions that directly address these E-E-A-T factors.
</reasoning>
<answer>
Here are three key actions the new cryptocurrency investment blog can take to improve its E-E-A-T signals:

Share Personal Investment Results (with caveats): The blog should publish articles detailing specific cryptocurrency investments they've made, including the rationale behind the investment, the amount invested, and the results (both positive and negative). Crucially, this needs to be framed with a disclaimer: "These are my personal investments and results may not be representative of your own. Cryptocurrency investing is highly risky." This demonstrates experience – they've actually put capital at risk. This is a powerful signal, but must be handled carefully to avoid misleading readers.
Cite Reputable Sources & Provide In-Depth Research: Each article should be backed by research from reliable sources – not just other crypto blogs. This could include whitepapers from blockchain projects, reports from financial institutions, regulatory filings, or academic research. The blog should explain why these sources are relevant and trustworthy. For example, "According to the SEC's filing for [Project Name],..." This demonstrates expertise.
Implement a Clear and Comprehensive Risk Disclosure Policy: Create a dedicated page on the website outlining the risks associated with cryptocurrency investing, including potential losses, scams, and regulatory uncertainty. This page should be easily accessible and regularly updated. This is essential for establishing trustworthiness and demonstrates a commitment to responsible information sharing. This should be prominently displayed on every page discussing investments.
</answer>

Evaluation (Preliminary PoC Results)

Initial qualitative evaluation showed this PoC model demonstrating:

  • Good adherence to the required XML output format.
  • Promising application of basic SEO concepts for tasks like schema suggestion and meta description optimization.
  • Instances where reasoning steps implicitly or explicitly referenced relevant SEO/ontology terms.

However, performance gaps compared to state-of-the-art models (like GPT-4o) were evident, particularly on:

  • Complex entity identification within SEO contexts.
  • Nuanced schema selection requiring deeper understanding.
  • Multi-step numerical reasoning (e.g., Query Trend Analysis), where format adherence sometimes failed.

LLM-as-a-Judge (Gemini 1.5 Pro) scores reflected this, indicating stronger performance on simpler, more structured tasks and lower scores on complex reasoning and strict format adherence under stress.

Further details on the methodology and evaluation has been presented at the Knowledge Graph Conference (KGC) 2025.

Intended Use & Purpose

This model is released as a Proof-of-Concept for research and experimentation purposes only.

  • Primary Use: To demonstrate the potential of combining ontology-guided Reinforcement Learning with LLMs for specialized domain reasoning.
  • Secondary Use: To serve as a baseline or starting point for researchers and developers interested in exploring this methodology further.
  • Not Intended For: Direct use in production SEO workflows without significant further development, testing, and validation.

Limitations and Bias

  • Dataset: Trained on a small (~1k examples), synthetic dataset. May not generalize well to diverse, real-world SEO complexities. Quality depends heavily on the synthetic data generation and reward scoring process.
  • Reasoning Quality: May generate plausible but incorrect or superficial SEO advice. The ontology guidance improves alignment but doesn't guarantee factual correctness or deep strategic insight in all cases.
  • Ontology Grounding: The model's understanding of seovoc is operational (learned via reward signals) rather than deeply intrinsic or symbolic.
  • Bias: Potential biases from the base Gemma model or the synthetic data generation process may persist.
  • Verification Needed: Always verify outputs with expert SEO knowledge and official documentation before implementation.

Future Directions & Collaboration

This PoC represents the first step. We believe this "recipe" – combining foundational models, rich domain ontologies, and guided RL – holds significant promise. We plan to iterate on this methodology, refine the SEOntology, improve the RL process, and explore larger datasets and models. We welcome feedback, suggestions, and contributions! Please use the GitHub issues/discussions associated with related projects or contact the authors.

Disclaimer

Use this model responsibly. The authors are not liable for any decisions made based on its outputs. It's a research artifact demonstrating a methodology.

Acknowledgements

  • This work was made possible thanks to GPU credits from Google Cloud.
  • Developed by the WordLift team, pushing the boundaries of Agentic SEO and Marketing Automation.
  • Built upon Google's Gemma 3 model and the Unsloth library for efficient fine-tuning.
  • Leverages concepts from schema.org and the SEOntology (seovoc).
  • Methodology presented at the Knowledge Graph Conference 2025 (KGC).
Downloads last month
148
Safetensors
Model size
4.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberandy/SEOcrate-4B_grpo_new_01

Finetuned
(7)
this model
Quantizations
2 models

Dataset used to train cyberandy/SEOcrate-4B_grpo_new_01