Mermaid Syntax Dataset

Dataset Summary

The Mermaid Syntax Dataset provides training and evaluation data for syntax understanding, validation, repair, and semantic titling of Mermaid.js diagrams.

It supports two primary tasks:

  1. Repair – Generate minimal diffs or patched diagrams that compile successfully.
  2. Titling – Propose a short, human-friendly title, optionally with a one-sentence summary, based on content and context (instead of “Untitled Diagram”).
  3. Generation – Create a new valid Mermaid diagram from a user instruction and optional diagram type.

Note: Validation is performed by the Mermaid parser before any model call. Parser diagnostics are exposed in the dataset as compiler_errors (array of strings) so the model can understand what failed and propose targeted repairs.


Supported Tasks and Benchmarks

  • Text Generation
    • REPAIR: Given an invalid diagram and parser diagnostics (compiler_errors), generate a corrected diagram (or a minimal patch).
    • TITLE: Given a valid diagram, generate a short, human-friendly title (optionally with a one-sentence summary).
    • GENERATE: Given a natural language instruction and optional diagram type, generate a new valid diagram (diagram_content) plus optional title and summary.

Task Categories

  • text-generation

Languages

  • English (en)
    All error messages, titles, and instructions are in English. Future multilingual expansions may include localized error messages.

Dataset Structure

Input Schema

{
  "task": "REPAIR|TITLE|GENERATE",
  "input": {
    "diagram": "string (for REPAIR|TITLE)",
    "instruction": "string (for GENERATE)",
    "context": "optional string",
    "diagram_type": "optional string",
    "compiler_errors": ["string (for REPAIR)"]
  }
}

compiler_errors is an optional array of strings produced by the Mermaid parser (e.g., "MISSING_ARROW at line 7", "UNTERMINATED_BLOCK: 'gantt' missing 'end'"). Include it for REPAIR samples; omit it for TITLE and GENERATE samples.

Output Schema

{
  "result": {
    "compiler_errors": ["string"],   // optional echo of parser diagnostics
    "patch": [                       // optional for REPAIR tasks
      {
        "op": "replace|insert|delete",
        "range": {"startLine": 1, "startCol": 5, "endLine": 1, "endCol": 10},
        "text": "new content"
      }
    ],
    "repaired_diagram": "string or null",   // for REPAIR
    "diagram_content": "string or null",    // for GENERATE
    "title": "string or null",              // for TITLE and GENERATE
    "summary": "string or null"             // optional one-sentence description
  }
}
  • compiler_errors: optional echo of parser diagnostics to provide context for the model.
  • patch: optional list of minimal edit operations for REPAIR tasks.
  • repaired_diagram: the corrected diagram (full text), used in REPAIR tasks.
  • diagram_content: the newly generated diagram, used in GENERATE tasks.
  • title: a short, human-friendly title, used in TITLE and GENERATE tasks.
  • summary: an optional one-sentence description or summary, used in TITLE and GENERATE tasks.

Examples

Example REPAIR

{
  "task": "REPAIR",
  "input": {
    "diagram": "flowchart TD\nA --> B",
    "compiler_errors": ["MISSING_ARROW at line 2"]
  },
  "result": {
    "compiler_errors": ["MISSING_ARROW at line 2"],
    "patch": [
      {
        "op": "replace",
        "range": {"startLine": 2, "startCol": 5, "endLine": 2, "endCol": 7},
        "text": "->"
      }
    ],
    "repaired_diagram": "flowchart TD\nA -> B",
    "title": null,
    "summary": null
  }
}

Example TITLE

{
  "task": "TITLE",
  "input": {
    "diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!"
  },
  "result": {
    "compiler_errors": [],
    "patch": [],
    "repaired_diagram": null,
    "title": "Alice greets Bob",
    "summary": "A simple sequence diagram showing Alice sending a greeting message to Bob."
  }
}

Example GENERATE

{
  "task": "GENERATE",
  "input": {
    "instruction": "Create a flowchart for the checkout process",
    "diagram_type": "flowchart"
  },
  "result": {
    "compiler_errors": [],
    "patch": [],
    "diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation",
    "title": "Checkout Flow",
    "summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process."
  }
}

Sample Data

An example of a sample.jsonl is included for each task type. Each line is a JSON object following the schema.

REPAIR Sample

{"task": "REPAIR", "input": {"diagram": "flowchart TD\nA -> B", "diagram_type": "flowchart", "compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"]}, "result": {"compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"], "patch": [{"op": "replace", "range": {"startLine": 2, "startCol": 3, "endLine": 2, "endCol": 4}, "text": "--"}], "repaired_diagram": "flowchart TD\nA --> B", "diagram_content": null, "title": null, "summary": null}}

TITLE Sample

{"task": "TITLE", "input": {"diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!", "diagram_type": "sequence"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": null, "title": "Alice greets Bob", "summary": "A simple sequence diagram showing Alice sending a greeting message to Bob."}}

GENERATE Sample

{"task": "GENERATE", "input": {"instruction": "Create a flowchart for the checkout process", "diagram_type": "flowchart"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation", "title": "Checkout Flow", "summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process."}}

Additional syntax-focused training samples have been generated from the Mermaid documentation and are available as JSONL files:

  • data/syntax_repair_samples.jsonl – contains REPAIR task samples with broken diagrams and their fixes.
  • data/syntax_title_samples.jsonl – contains TITLE task samples with valid diagrams, titles, and summaries.
  • data/syntax_generate_samples.jsonl – contains GENERATE task samples with instructions and generated diagrams.
  • data/syntax_all_samples.jsonl – combined file with all tasks.

These files can be used to train models specifically on Mermaid syntax understanding, repair, and generation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support