cjvt
/

BERTic-cro-word-spelling-annotator

word spelling error annotator

Model card Files Files and versions Community

BERTic-cro-word-spelling-annotator / README.md

matejklemen's picture

Update README.md

05b9fdf verified 4 months ago

|

history blame contribute delete

1.97 kB

	---
	license: cc-by-sa-4.0
	language:
	- cro
	tags:
	- word spelling error annotator
	---

	---
	language:
	- cro

	license: cc-by-sa-4.0
	---

	# BERTic-Incorrect-Spelling-Annotator

	This BERTic model is designed to annotate incorrectly spelled words in text. It utilizes the following labels:

	- 0: Word is written correctly,
	- 1: Word is written incorrectly.

	## Model Output Example

	Imagine we have the following Croatian text:

	_Model u tekstu prepoznije riječi u kojima se nalazaju pogreške ._

	If we convert input data to format acceptable by BERTic model:

	_[CLS] model [MASK] u [MASK] tekstu [MASK] prepo ##znije [MASK] riječi [MASK] u [MASK] kojima [MASK] se [MASK] nalaza ##ju [MASK] pogreške [MASK] . [MASK] [SEP]_

	The model might return the following predictions (note: predictions chosen for demonstration/explanation, not reproducibility!):

	_Model 0 u 0 tekstu 0 prepoznije 1 riječi 0 u 0 kojima 0 se 0 nalazaju 1 pogreške 0 . 0_

	We can observe that in the input sentence, the word `prepoznije` and `nalazaju` are spelled incorrectly, so the model marks them with the token (1).

	## More details

	Testing model with generated test sets provides following result:

	Precision: 0.9954
	Recall: 0.8764
	F1 Score: 0.9321
	F0.5 Score: 0.9691

	Testing the model with test sets constructed using the Croatian corpus of non-professional written language by typical speakers and speakers with language disorders RAPUT 1.0 dataset provides the following results:

	Precision: 0.8213
	Recall: 0.3921
	F1 Score: 0.5308
	F0.5 Score: 0.6738

	## Acknowledgement

	The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.

	## Authors

	Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing this model.