Greek-Lesbian Morphosyntactic Model (Stanza + Greek BERT)

This repository hosts a morphosyntactic model trained using Stanza and fine-tuned with Greek BERT for the Lesbian dialect of Greek (spoken on the island of Lesbos). The model has been trained and evaluated on a small, curated treebank of 540 sentences (500 for training, 30 for testing, 10 for development).

The model aims to support part-of-speech tagging, morphological analysis, and dependency parsing for dialectal Greek and is part of a broader effort to document and process regional language varieties.

πŸ“š Dataset

The treebank is a manually annotated resource compiled from both oral and written sources. Oral data were collected between 2023 and 2024 from speakers in various villages of Lesbos:

  • Agra (Male speaker)
  • Chidira (Female speaker)
  • Eressos (Male speaker)
  • Pterounta (Female speaker)
  • Mesotopos (Male speaker)
  • Parakoila (Female speaker)

Written sources include:

  • Papanis, D. & Papanis, G. D. (2004). Lexiko tou Agiasotikou Glosikou Idiomatos
  • Tsokarou-Mitsioni, E. (1998). Palies Istories ap' tn Agiasiou
  • Tsokarou-Mitsioni, E. (2019). ProsfygiΓ‘
  • Anagnostopoulou, M. A. (2021). Thematiko Lexiko tis Lesviakis Dialektou
  • Anagnostou, V. T. (2014). Tsi sta th'ka mas: Komodia sta k'stariot'ka

The full treebank is openly available here: πŸ”— UD_Greek-Lesbian on GitHub

🧠 Model Architecture

  • Base pipeline: Stanza (v1.7.0+)
  • Pretrained LM: Greek BERT
  • Tasks: Tokenization, Lemmatization, POS tagging, Morphological features, Dependency parsing
  • Fine-tuning: Conducted on the UD_Greek-Lesbian treebank

πŸ“ˆ Performance

Due to the limited size of the training data, the model should be considered experimental. It is optimized for research purposes and performs best on dialectal content similar to the training sources. Further fine-tuning and larger datasets will be necessary for production use.

πŸ“„ Citation

If you use this model or the accompanying treebank, please cite:

Bompolas, S., Markantonatou, S., Ralli, A., & Anastasopoulos, A. (2025). Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek. In Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025). Association for Computational Linguistics.

πŸ”— Related Resources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using sbompolas/Lesbian-Greek-Morphosyntactic-Model 1