Towards Kyrgyz stop words
Straipsniai
Ruslan Isaev
Ala-Too International University, Kyrgyz Republic
https://orcid.org/0000-0003-4426-8837
Gulzada Esenalieva
Ala-Too International University, Kyrgyz Republic
https://orcid.org/0009-0000-9135-1671
Ermek Doszhanov
Ala-Too International University, Kyrgyz Republic
https://orcid.org/0009-0002-4939-5683
Publikuota 2023-12-28
https://doi.org/10.15388/Kalbotyra.2023.76.4
PDF
HTML

Kaip cituoti

Isaev, R., Esenalieva, G. and Doszhanov, E. (2023) “Towards Kyrgyz stop words”, Kalbotyra, 76, pp. 54–65. doi:10.15388/Kalbotyra.2023.76.4.

Santrauka

The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.
The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.

PDF
HTML
Kūrybinių bendrijų licencija

Šis kūrinys yra platinamas pagal Kūrybinių bendrijų Priskyrimas 4.0 tarptautinę licenciją.

Atsisiuntimai

Nėra atsisiuntimų.