πŸš€ NER-RoBERTa: Fine-Tuned Named Entity Recognition Model

A robust Named Entity Recognition (NER) model fine-tuned on custom annotated resume/career-related data using RoBERTa architecture. This model is capable of extracting structured information such as personal details, education, work experience, skills, and more from unstructured text, making it highly suitable for resume parsing, HR automation, and document understanding tasks.


🧠 Model Details

  • Model architecture: RoBERTa base (roberta-base)

  • Task: Token Classification (NER)

  • Fine-tuned on: Annotated resume dataset (custom labels)

  • Entity types:

    • NAME
    • CONTACT, EMAIL, LOCATION
    • LINKEDIN, GITHUB
    • ORG_NAME, JOB_TITLE, START_DATE, END_DATE
    • DEGREE, FIELD_OF_STUDY, GRADUATION_YEAR, GPA
    • SKILLS, PROJECT_TITLE, LANGUAGES, OTHER

πŸ“¦ Files Included

  • config.json
  • pytorch_model.bin or model.safetensors
  • tokenizer_config.json, vocab.json, tokenizer.json
  • special_tokens_map.json
  • merges.txt

πŸ“Š Example Usage

from transformers import RobertaTokenizerFast, RobertaForTokenClassification
import torch

# Load model and tokenizer
model = RobertaForTokenClassification.from_pretrained("venkatasagar/NER-roBERTa-finetuned")
tokenizer = RobertaTokenizerFast.from_pretrained("venkatasagar/NER-roBERTa-finetuned")

# Sample text
text = "John Doe is a software engineer at Google. He graduated with a B.Tech in Computer Science from MIT in 2022."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)

# Decode results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[label_id] for label_id in predictions[0]]

for token, label in zip(tokens, predicted_labels):
    print(f"{token}: {label}")

πŸ“ˆ Intended Use Cases

  • Resume parsing
  • HR and recruitment platforms
  • Talent analytics
  • Job-matching engines
  • NLP-based document processors

🏷️ Tags

NER, transformers, huggingface, token-classification, roberta, resume-parser, nlp, named-entity-recognition, custom-dataset, career-data, information-extraction


πŸ“ Datasets & Training

This model was trained on a custom-labeled resume dataset containing various sections such as education, experience, projects, and skills. The dataset included .txt, .pdf, and .docx formats processed using SpaCy and PyMuPDF/Docx libraries.

If you'd like to access the dataset or contribute, please contact the maintainer.


πŸ“€ Model Hosted On

Model Hub: https://huggingface.co/venkatasagar/NER-roBERTa-finetuned


🀝 Contributing

Feel free to fork the repository and open issues or PRs to enhance the model or pipeline!


πŸ§‘β€πŸ’» Maintainer

Name: Venkata Sagar Contact: [[email protected]]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for venkatasagar/NER-roBERTa-finetuned

Finetuned
(3179)
this model