LLM-Essay Detector (RoBERTa-base Finetuned)

This model is a fine-tuned version of FacebookAI/roberta-base for detecting LLM-generated student essays.
It was developed as part of the research paper:

Lukas Gehring & Benjamin Paaßen (2025)
Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?
arXiv:2508.08096


Model Details

  • Model type: Transformer-based text classifier (binary classification)
  • Language: English
  • Base model: FacebookAI/roberta-base
  • Labels: ["human-written (LABEL_0)", "LLM-generated (LABEL_1)"]

Training Data

Human-written Essays

The human-written essays are from the Argument Annotated Essays (version 2) dataset (Stab et al. [1]),
a corpus of argumentative student essays originally designed for persuasive writing research.

LLM-generated Essays

Synthetic essays were generated following the methodology described in the cited paper (Gehring & Paaßen, 2025),
to simulate LLM-produced student writing.
These examples were paired with the human-written essays to create a balanced binary classification dataset.


Related Models

We also provide two additional fine-tuned versions of this model:

https://huggingface.co/lgehring/roberta-base-gede-bawe https://huggingface.co/lgehring/roberta-base-gede-persuade

Intended Use

  • Intended for: Research on LLM text detection, model interpretability, and fairness in educational contexts.
  • Out-of-scope use: Automated essay grading, plagiarism detection, or disciplinary decisions in education.

Limitations

  • May produce false positives (classifying human text as LLM-generated) or false negatives.
  • Performance depends on domain similarity between training data and input text.
  • Should not be used to make high-stakes decisions about students or authorship.

Example Usage

from transformers import pipeline

detector = pipeline("text-classification", model="lgehring/roberta-base-gede-aae")
text = "This essay argues that artificial intelligence will reshape education."
print(detector(text))

Cite

If you use this model, please cite:

@misc{gehring2025assessingllmtextdetection,
      title={Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?}, 
      author={Lukas Gehring and Benjamin Paaßen},
      year={2025},
      eprint={2508.08096},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.08096}, 
}

References

[1] Christian Stab, Iryna Gurevych; Parsing Argumentation Structures in Persuasive Essays. Computational Linguistics 2017; 43 (3): 619–659. doi: https://doi.org/10.1162/COLI_a_00295

Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lgehring/roberta-base-gede-aae

Finetuned
(1918)
this model