LLM_Detector_Preview_model

Preview release of an LLM-generated text detector.

Model Description

This model is designed to classify text as Human, Mixed, or AI-generated. It is based on a sequence classification architecture and was trained on a mix of human and AI-generated texts. The model can be used for document, sentence, and token-level analysis.

Architecture: ModernBERT (or compatible Transformer)
Labels:
- 0: Human
- 1: Mixed
- 2: AI

Intended Use

For research and curiosity only.
Not for academic, legal, medical, or high-stakes use.
Results are easy to bypass and may be unreliable.

Limitations & Warnings

This model is experimental and not clinically accurate.
It can produce false positives and false negatives.
Simple paraphrasing or editing can fool the detector.
Do not use for academic integrity, hiring, or legal decisions.

How It Works

The model analyzes text and predicts the likelihood of it being human-written, mixed, or AI-generated. It uses statistical patterns learned from training data, but these patterns are not foolproof and can be circumvented.

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained('Donnyed/LLM_Detector_Preview_model')
model = AutoModelForSequenceClassification.from_pretrained('Donnyed/LLM_Detector_Preview_model')

text = "Paste your text here."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    print('Prediction:', pred)
    print('Probabilities:', probs)

Files Included

model.safetensors — Model weights
config.json — Model configuration
tokenizer.json, tokenizer_config.json, special_tokens_map.json — Tokenizer files

Donnyed
/

LLM_Detector_Preview_model