mohannad-tazi
/

NER_Darija_MAR_FSBM

@@ -1,54 +1,125 @@
-# Model Name: Your Model's Name
 ## Model Description
-This model is a **Named Entity Recognition (NER)** model fine-tuned on the **CoNLL-03** dataset. It is designed to recognize **person**, **organization**, and **location** entities in English text. The model is based on the **BERT architecture** and is useful for information extraction tasks, such as named entity recognition in documents, web scraping, or chatbots.
 ### Model Architecture
 - **Architecture**: BERT-based model for token classification
-- **Pre-trained Model**: BERT
-- **Fine-tuning Dataset**: CoNLL-03
-- **Languages**: English
 ## Intended Use
-This model is designed for Named Entity Recognition tasks. It can identify and classify entities such as:
-- **Person**: People’s names (e.g., "Elon Musk")
-- **Organization**: Company or organization names (e.g., "Tesla", "Bank of America")
-- **Location**: Geographical locations (e.g., "New York", "Paris")
 ### Use Cases
-- **Document classification**: Classifying text into named entity categories.
-- **Information extraction**: Extracting entities from a large corpus of text.
-- **Chatbots**: Enhance chatbots by identifying named entities within user queries.
-- **Named entity linking**: Link entities to a knowledge base.
 ## How to Use
-To use the model, you need to load the tokenizer and model with the `transformers` library. Here's an example of how to do that:
 ```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-# Load the tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
-model = AutoModelForTokenClassification.from_pretrained("your-username/your-model-name")
-# Initialize the NER pipeline
-ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
-# Use the model to predict named entities in a text
-result = ner_pipeline("Elon Musk is the CEO of Tesla and lives in California.")
 print(result)
-# Model Training Data
-This model was trained on the CoNLL-03 dataset, which contains English text annotated with named entity labels. The dataset consists of:
-Training set: 14,041 sentences
-Validation set: 3,466 sentences
-Test set: 3,684 sentences
-The entities are labeled into three categories: Person, Organization, and Location.
 # Preprocessing Steps
-Tokenization using the BERT tokenizer.
-Alignment of labels with tokenized inputs (considering word-piece tokens).
-Padding and truncating sentences to a fixed length for uniformity.

+# NER Model for Moroccan Dialect (Darija)
 ## Model Description
+This model is a **Named Entity Recognition (NER)** model fine-tuned on the **DarNERcorp** dataset. It is designed to recognize entities such as **person names**, **locations**, **organizations**, and **miscellaneous entities** in Moroccan Arabic (Darija) text. The model is based on the **BERT architecture** and is useful for tasks such as information extraction from social media or news articles.
 ### Model Architecture
 - **Architecture**: BERT-based model for token classification
+- **Pre-trained Model**: aubmindlab/bert-base-arabertv02
+- **Fine-tuning Dataset**: DarNERcorp
+- **Languages**: Moroccan Arabic (Darija)
 ## Intended Use
+This model is designed for Named Entity Recognition tasks in Moroccan Arabic. It can identify and classify entities such as:
+- **PER**: Person names (e.g., "محمد", "فاطمة")
+- **LOC**: Locations (e.g., "الرباط", "طنجة")
+- **ORG**: Organizations (e.g., "البنك المغربي", "جامعة الحسن الثاني")
+- **MISC**: Miscellaneous entities
 ### Use Cases
+- **Social media analysis**: Extracting entities from Moroccan Arabic posts and tweets.
+- **News summarization**: Identifying important entities in news articles.
+- **Information extraction**: Extracting named entities from informal or formal texts.
+## Evaluation Results
+The model achieves the following results on the evaluation dataset:
+- **Precision**: 74.04%
+- **Recall**: 85.16%
+- **F1 Score**: 78.61%
 ## How to Use
+To use the model, you need to load it with the Hugging Face Transformers library. Here's an example:
 ```python
+from transformers import pipeline
+# Load the model
+nlp = pipeline("ner", model="your-username/ner-darija-darner")
+# Use the model
+text = "محمد كان في الرباط."
+result = nlp(text)
 print(result)
+# Dataset
+The model is trained on the DarNERcorp dataset, a corpus designed specifically for Named Entity Recognition in the Moroccan Arabic dialect. The dataset includes sentences labeled with named entity tags such as PER, LOC, ORG, and MISC.
 # Preprocessing Steps
+- Tokenization using the BERT tokenizer.
+- Alignment of labels with tokenized inputs (considering word-piece tokens).
+- Padding and truncating sentences to a fixed length for uniformity.
+#Limitations
+The model is trained on a specific corpus and may not generalize well to all Moroccan Arabic texts.
+Performance may vary depending on text quality and tagging consistency in the dataset.
+---
+library_name: transformers
+base_model: aubmindlab/bert-base-arabertv02
+datasets:
+  - DarNERcorp
+tags:
+  - ner
+  - named-entity-recognition
+  - arabic
+  - darija
+language: ar
+pipeline_tag: token-classification
+license: apache-2.0
+---
+# NER Model for Moroccan Dialect (Darija)
+This model is fine-tuned for Named Entity Recognition (NER) in Moroccan Arabic (Darija). It recognizes entities such as locations, organizations, and person names in text written in Darija.
+## Base Model
+This model is fine-tuned from the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) model, which is optimized for Arabic NLP tasks.
+## Dataset
+The model is trained on the **DarNERcorp** dataset, a corpus designed for Named Entity Recognition in the Moroccan Arabic dialect.
+## Task
+The model is designed for the **token-classification** task, specifically Named Entity Recognition (NER).
+### NER Tags
+The model recognizes the following tags:
+- **PER**: Person names
+- **LOC**: Locations
+- **ORG**: Organizations
+- **MISC**: Miscellaneous entities
+## Evaluation Results
+The model achieves the following results on the evaluation dataset:
+- **Precision**: 74.04%
+- **Recall**: 85.16%
+- **F1 Score**: 78.61%
+## Intended Use
+This model is intended for extracting named entities from Moroccan Arabic (Darija) text. It can be applied to:
+- Social media content
+- News articles
+- Other informal or formal texts in Darija
+## How to Use
+You can use this model with the Hugging Face Transformers library:
+```python
+from transformers import pipeline
+# Load the model
+nlp = pipeline("ner", model="ymohannad-tazi/ner-darija-darner")
+# Use the model
+text = "محمد كان في الرباط."
+result = nlp(text)
+print(result)