mohannad-tazi commited on
Commit
5899665
·
verified ·
1 Parent(s): 974a64e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +104 -33
README.md CHANGED
@@ -1,54 +1,125 @@
1
 
2
- # Model Name: Your Model's Name
3
 
4
  ## Model Description
5
- This model is a **Named Entity Recognition (NER)** model fine-tuned on the **CoNLL-03** dataset. It is designed to recognize **person**, **organization**, and **location** entities in English text. The model is based on the **BERT architecture** and is useful for information extraction tasks, such as named entity recognition in documents, web scraping, or chatbots.
6
 
7
  ### Model Architecture
8
  - **Architecture**: BERT-based model for token classification
9
- - **Pre-trained Model**: BERT
10
- - **Fine-tuning Dataset**: CoNLL-03
11
- - **Languages**: English
12
 
13
  ## Intended Use
14
- This model is designed for Named Entity Recognition tasks. It can identify and classify entities such as:
15
- - **Person**: People’s names (e.g., "Elon Musk")
16
- - **Organization**: Company or organization names (e.g., "Tesla", "Bank of America")
17
- - **Location**: Geographical locations (e.g., "New York", "Paris")
 
18
 
19
  ### Use Cases
20
- - **Document classification**: Classifying text into named entity categories.
21
- - **Information extraction**: Extracting entities from a large corpus of text.
22
- - **Chatbots**: Enhance chatbots by identifying named entities within user queries.
23
- - **Named entity linking**: Link entities to a knowledge base.
 
 
 
 
 
 
24
 
25
  ## How to Use
26
- To use the model, you need to load the tokenizer and model with the `transformers` library. Here's an example of how to do that:
27
 
28
  ```python
29
- from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
30
 
31
- # Load the tokenizer and model
32
- tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
33
- model = AutoModelForTokenClassification.from_pretrained("your-username/your-model-name")
34
 
35
- # Initialize the NER pipeline
36
- ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
37
-
38
- # Use the model to predict named entities in a text
39
- result = ner_pipeline("Elon Musk is the CEO of Tesla and lives in California.")
40
  print(result)
41
 
42
- # Model Training Data
43
- This model was trained on the CoNLL-03 dataset, which contains English text annotated with named entity labels. The dataset consists of:
44
-
45
- Training set: 14,041 sentences
46
- Validation set: 3,466 sentences
47
- Test set: 3,684 sentences
48
- The entities are labeled into three categories: Person, Organization, and Location.
49
 
50
  # Preprocessing Steps
51
- Tokenization using the BERT tokenizer.
52
- Alignment of labels with tokenized inputs (considering word-piece tokens).
53
- Padding and truncating sentences to a fixed length for uniformity.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
 
1
 
2
+ # NER Model for Moroccan Dialect (Darija)
3
 
4
  ## Model Description
5
+ This model is a **Named Entity Recognition (NER)** model fine-tuned on the **DarNERcorp** dataset. It is designed to recognize entities such as **person names**, **locations**, **organizations**, and **miscellaneous entities** in Moroccan Arabic (Darija) text. The model is based on the **BERT architecture** and is useful for tasks such as information extraction from social media or news articles.
6
 
7
  ### Model Architecture
8
  - **Architecture**: BERT-based model for token classification
9
+ - **Pre-trained Model**: aubmindlab/bert-base-arabertv02
10
+ - **Fine-tuning Dataset**: DarNERcorp
11
+ - **Languages**: Moroccan Arabic (Darija)
12
 
13
  ## Intended Use
14
+ This model is designed for Named Entity Recognition tasks in Moroccan Arabic. It can identify and classify entities such as:
15
+ - **PER**: Person names (e.g., "محمد", "فاطمة")
16
+ - **LOC**: Locations (e.g., "الرباط", "طنجة")
17
+ - **ORG**: Organizations (e.g., "البنك المغربي", "جامعة الحسن الثاني")
18
+ - **MISC**: Miscellaneous entities
19
 
20
  ### Use Cases
21
+ - **Social media analysis**: Extracting entities from Moroccan Arabic posts and tweets.
22
+ - **News summarization**: Identifying important entities in news articles.
23
+ - **Information extraction**: Extracting named entities from informal or formal texts.
24
+
25
+ ## Evaluation Results
26
+
27
+ The model achieves the following results on the evaluation dataset:
28
+ - **Precision**: 74.04%
29
+ - **Recall**: 85.16%
30
+ - **F1 Score**: 78.61%
31
 
32
  ## How to Use
33
+ To use the model, you need to load it with the Hugging Face Transformers library. Here's an example:
34
 
35
  ```python
36
+ from transformers import pipeline
37
 
38
+ # Load the model
39
+ nlp = pipeline("ner", model="your-username/ner-darija-darner")
 
40
 
41
+ # Use the model
42
+ text = "محمد كان في الرباط."
43
+ result = nlp(text)
 
 
44
  print(result)
45
 
46
+ # Dataset
47
+ The model is trained on the DarNERcorp dataset, a corpus designed specifically for Named Entity Recognition in the Moroccan Arabic dialect. The dataset includes sentences labeled with named entity tags such as PER, LOC, ORG, and MISC.
 
 
 
 
 
48
 
49
  # Preprocessing Steps
50
+ - Tokenization using the BERT tokenizer.
51
+ - Alignment of labels with tokenized inputs (considering word-piece tokens).
52
+ - Padding and truncating sentences to a fixed length for uniformity.
53
+
54
+ #Limitations
55
+ The model is trained on a specific corpus and may not generalize well to all Moroccan Arabic texts.
56
+ Performance may vary depending on text quality and tagging consistency in the dataset.
57
+
58
+ ---
59
+ library_name: transformers
60
+ base_model: aubmindlab/bert-base-arabertv02
61
+ datasets:
62
+ - DarNERcorp
63
+ tags:
64
+ - ner
65
+ - named-entity-recognition
66
+ - arabic
67
+ - darija
68
+ language: ar
69
+ pipeline_tag: token-classification
70
+ license: apache-2.0
71
+ ---
72
+
73
+ # NER Model for Moroccan Dialect (Darija)
74
+
75
+ This model is fine-tuned for Named Entity Recognition (NER) in Moroccan Arabic (Darija). It recognizes entities such as locations, organizations, and person names in text written in Darija.
76
+
77
+ ## Base Model
78
+
79
+ This model is fine-tuned from the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) model, which is optimized for Arabic NLP tasks.
80
+
81
+ ## Dataset
82
+
83
+ The model is trained on the **DarNERcorp** dataset, a corpus designed for Named Entity Recognition in the Moroccan Arabic dialect.
84
+
85
+ ## Task
86
+
87
+ The model is designed for the **token-classification** task, specifically Named Entity Recognition (NER).
88
+
89
+ ### NER Tags
90
+ The model recognizes the following tags:
91
+ - **PER**: Person names
92
+ - **LOC**: Locations
93
+ - **ORG**: Organizations
94
+ - **MISC**: Miscellaneous entities
95
+
96
+ ## Evaluation Results
97
+
98
+ The model achieves the following results on the evaluation dataset:
99
+ - **Precision**: 74.04%
100
+ - **Recall**: 85.16%
101
+ - **F1 Score**: 78.61%
102
+
103
+ ## Intended Use
104
+
105
+ This model is intended for extracting named entities from Moroccan Arabic (Darija) text. It can be applied to:
106
+ - Social media content
107
+ - News articles
108
+ - Other informal or formal texts in Darija
109
+
110
+ ## How to Use
111
+
112
+ You can use this model with the Hugging Face Transformers library:
113
+
114
+ ```python
115
+ from transformers import pipeline
116
+
117
+ # Load the model
118
+ nlp = pipeline("ner", model="ymohannad-tazi/ner-darija-darner")
119
+
120
+ # Use the model
121
+ text = "محمد كان في الرباط."
122
+ result = nlp(text)
123
+ print(result)
124
+
125