PakdamanAli
/

keyword_distilbert_base_per

+---
+language: fa
+license: mit
+tags:
+  - keyword-extraction
+  - persian
+  - farsi
+  - token-classification
+  - distilbert
+  - nlp
+datasets:
+  - custom
+metrics:
+  - precision
+  - recall
+  - f1
+widget:
+  - text: "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
+---
+# Model Datacard: Persian Keyword Extraction Model
+## Model Details
+- **Model Name**: keyword_distilbert_base_per
+- **Base Model**: distilbert
+- **Task**: Keyword Extraction
+- **Language**: Persian (Farsi)
+- **Developer**: PakdamanAli
+- **Model Version**: 1.0.0
+## Intended Use
+This model is designed to extract keywords from Persian text. It can be used for:
+- Automatic tagging of content
+- Search engine optimization
+- Content categorization
+- Topic modeling
+- Information retrieval enhancement
+### Primary Intended Uses
+- Content analysis for Persian websites
+- Academic research on Persian text
+- Information extraction systems
+### Out-of-Scope Use Cases
+- Translation services
+- Text summarization
+- Persian named entity recognition (unless specifically trained for this)
+- Other NLP tasks beyond keyword extraction
+## Training Data
+- **Dataset Size**: 40,000 Persian text samples
+- **Data Preparation**: Fine-tuned on xlm-roberta-large
+## Performance Evaluation
+Metrics and evaluation results will be published in a future update.
+## Limitations
+- The model may not perform well on domain-specific content that was not represented in the training data
+- Performance may vary for very short or extremely long texts
+- The model may occasionally extract words that are not truly "key" to the content
+- Dialect variations in Persian might affect extraction quality
+## Ethical Considerations
+- The model is trained on Persian text and may reflect biases present in that content
+- Users should verify extracted keywords for sensitive content before implementing in automated systems
+- The model should not be used to extract or analyze personally identifiable information without proper consent
+## Technical Specifications
+- **Input**: Persian text (UTF-8 encoded)
+- **Output**: List of extracted keywords
+- **Framework**: Transformers (Hugging Face)
+- **Requirements**: PyTorch, Transformers
+## Pipeline Usage
+To use this model with the Hugging Face pipeline:
+```python
+from transformers import pipeline
+# Initialize the pipeline
+keyword_extractor = pipeline(
+    task="token-classification",
+    model="PakdamanAli/keyword_distilbert_base_per",
+    tokenizer="PakdamanAli/keyword_distilbert_base_per"
+)
+# Example usage
+text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
+keywords = keyword_extractor(text)
+# Process the results based on the model output format
+# Example: extracted_keywords = [item["word"] for item in keywords]
+```
+## Example
+```python
+from transformers import pipeline
+extractor = pipeline(
+    task="token-classification",
+    model="PakdamanAli/keyword_distilbert_base_per",
+    tokenizer="PakdamanAli/keyword_distilbert_base_per"
+)
+text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
+results = extractor(text)
+# Extract just the words from the results
+keywords = [item["word"] for item in results]
+print(keywords)
+```