π NER-RoBERTa: Fine-Tuned Named Entity Recognition Model
A robust Named Entity Recognition (NER) model fine-tuned on custom annotated resume/career-related data using RoBERTa architecture. This model is capable of extracting structured information such as personal details, education, work experience, skills, and more from unstructured text, making it highly suitable for resume parsing, HR automation, and document understanding tasks.
π§ Model Details
Model architecture: RoBERTa base (
roberta-base
)Task: Token Classification (NER)
Fine-tuned on: Annotated resume dataset (custom labels)
Entity types:
NAME
CONTACT
,EMAIL
,LOCATION
LINKEDIN
,GITHUB
ORG_NAME
,JOB_TITLE
,START_DATE
,END_DATE
DEGREE
,FIELD_OF_STUDY
,GRADUATION_YEAR
,GPA
SKILLS
,PROJECT_TITLE
,LANGUAGES
,OTHER
π¦ Files Included
config.json
pytorch_model.bin
ormodel.safetensors
tokenizer_config.json
,vocab.json
,tokenizer.json
special_tokens_map.json
merges.txt
π Example Usage
from transformers import RobertaTokenizerFast, RobertaForTokenClassification
import torch
# Load model and tokenizer
model = RobertaForTokenClassification.from_pretrained("venkatasagar/NER-roBERTa-finetuned")
tokenizer = RobertaTokenizerFast.from_pretrained("venkatasagar/NER-roBERTa-finetuned")
# Sample text
text = "John Doe is a software engineer at Google. He graduated with a B.Tech in Computer Science from MIT in 2022."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# Decode results
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[label_id] for label_id in predictions[0]]
for token, label in zip(tokens, predicted_labels):
print(f"{token}: {label}")
π Intended Use Cases
- Resume parsing
- HR and recruitment platforms
- Talent analytics
- Job-matching engines
- NLP-based document processors
π·οΈ Tags
NER
, transformers
, huggingface
, token-classification
, roberta
, resume-parser
, nlp
, named-entity-recognition
, custom-dataset
, career-data
, information-extraction
π Datasets & Training
This model was trained on a custom-labeled resume dataset containing various sections such as education, experience, projects, and skills. The dataset included .txt
, .pdf
, and .docx
formats processed using SpaCy and PyMuPDF/Docx libraries.
If you'd like to access the dataset or contribute, please contact the maintainer.
π€ Model Hosted On
Model Hub: https://huggingface.co/venkatasagar/NER-roBERTa-finetuned
π€ Contributing
Feel free to fork the repository and open issues or PRs to enhance the model or pipeline!
π§βπ» Maintainer
Name: Venkata Sagar Contact: [[email protected]]
Model tree for venkatasagar/NER-roBERTa-finetuned
Base model
FacebookAI/xlm-roberta-base