sbompolas commited on
Commit
e1a4a8d
·
verified ·
1 Parent(s): 1f6688b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ ---
4
+
5
+ # Greek-Lesbian Morphosyntactic Model (Stanza + Greek BERT)
6
+
7
+ This repository hosts a morphosyntactic model trained using [Stanza](https://stanfordnlp.github.io/stanza/) and fine-tuned with [Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) for the **Lesbian dialect of Greek** (spoken on the island of Lesbos). The model has been trained and evaluated on a small, curated treebank of 540 sentences (500 for training, 30 for testing, 10 for development).
8
+
9
+ The model aims to support part-of-speech tagging, morphological analysis, and dependency parsing for dialectal Greek and is part of a broader effort to document and process regional language varieties.
10
+
11
+ ## 📚 Dataset
12
+
13
+ The treebank is a manually annotated resource compiled from both **oral** and **written** sources. Oral data were collected between 2023 and 2024 from speakers in various villages of Lesbos:
14
+
15
+ * **Agra** (Male speaker)
16
+ * **Chidira** (Female speaker)
17
+ * **Eressos** (Male speaker)
18
+ * **Pterounta** (Female speaker)
19
+ * **Mesotopos** (Male speaker)
20
+ * **Parakoila** (Female speaker)
21
+
22
+ Written sources include:
23
+
24
+ * Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos*
25
+ * Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou*
26
+ * Tsokarou-Mitsioni, E. (2019). *Prosfygiá*
27
+ * Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou*
28
+ * Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka*
29
+
30
+ The full treebank is openly available here:
31
+ 🔗 [UD\_Greek-Lesbian on GitHub](https://github.com/UniversalDependencies/UD_Greek-Lesbian)
32
+
33
+ ## 🧠 Model Architecture
34
+
35
+ * **Base pipeline**: Stanza (v1.7.0+)
36
+ * **Pretrained LM**: [Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
37
+ * **Tasks**: Tokenization, Lemmatization, POS tagging, Morphological features, Dependency parsing
38
+ * **Fine-tuning**: Conducted on the UD\_Greek-Lesbian treebank
39
+
40
+ ## 📈 Performance
41
+
42
+ Due to the limited size of the training data, the model should be considered **experimental**. It is optimized for research purposes and performs best on dialectal content similar to the training sources. Further fine-tuning and larger datasets will be necessary for production use.
43
+
44
+ ## 📄 Citation
45
+
46
+ If you use this model or the accompanying treebank, please cite:
47
+
48
+ > Bompolas, S., Markantonatou, S., Ralli, A., & Anastasopoulos, A. (2025). *Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek*. In Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025). Association for Computational Linguistics.
49
+
50
+ ## 🔗 Related Resources
51
+
52
+ * [Universal Dependencies (UD)](https://universaldependencies.org/)
53
+ * [Stanza Documentation](https://stanfordnlp.github.io/stanza/)
54
+ * [Greek BERT on Hugging Face](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)