TOSRobertaV2: Terms of Service Fairness Classifier

Model Description

TOSRobertaV2 is a fine-tuned RoBERTa-large model designed to classify clauses in Terms of Service (ToS) documents based on their fairness level. The model categorizes clauses into three classes: clearly fair, potentially unfair, and clearly unfair.

Intended Use

This model is intended for:

  • Analyzing Terms of Service documents for potential unfair clauses
  • Assisting legal professionals in reviewing contracts
  • Helping consumers understand the fairness of agreements they're entering into
  • Supporting researchers studying fairness in legal documents

Training Data

The model was trained on the CodeHima/TOS_DatasetV3, which contains labeled clauses from various Terms of Service documents.

Training Procedure

  • Base model: RoBERTa-large
  • Training type: Fine-tuning
  • Number of epochs: 5
  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Batch size: 8
  • Weight decay: 0.01
  • Training loss: 0.3851972973652529

Evaluation Results

Validation Set Performance

  • Accuracy: 0.86
  • F1 Score: 0.8588
  • Precision: 0.8598
  • Recall: 0.8600

Test Set Performance

  • Accuracy: 0.8651

Training Progress

Epoch Training Loss Validation Loss Accuracy F1 Precision Recall
1 0.5391 0.493973 0.798095 0.7997 0.8056 0.79810
2 0.4621 0.489970 0.831429 0.8320 0.8330 0.83143
3 0.3954 0.674849 0.821905 0.8250 0.8349 0.82191
4 0.3783 0.717495 0.860000 0.8588 0.8598 0.86000
5 0.1542 0.881050 0.847619 0.8490 0.8514 0.84762

Limitations

  • The model's performance may vary on ToS documents from domains or industries not well-represented in the training data.
  • It may struggle with highly complex or ambiguous clauses.
  • The model's understanding of "fairness" is based on the training data and may not capture all nuances of legal fairness.

Ethical Considerations

  • This model should not be used as a substitute for professional legal advice.
  • There may be biases present in the training data that could influence the model's judgments.
  • Users should be aware that the concept of "fairness" in legal documents can be subjective and context-dependent.

How to Use

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("CodeHima/TOSRobertaV2")
model = AutoModelForSequenceClassification.from_pretrained("CodeHima/TOSRobertaV2")

text = "Your clause here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

probabilities = torch.softmax(logits, dim=1)
predicted_class = torch.argmax(probabilities, dim=1).item()

classes = ['clearly fair', 'potentially unfair', 'clearly unfair']
print(f"Predicted class: {classes[predicted_class]}")
print(f"Probabilities: {probabilities[0].tolist()}")

Citation

If you use this model in your research, please cite:

@misc{TOSRobertaV2,
  author = {CodeHima},
  title = {TOSRobertaV2: Terms of Service Fairness Classifier},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/CodeHima/TOSRobertaV2}}
}

License

This model is released under the MIT license.

Downloads last month
6
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.