Towards Kyrgyz stop words

Ruslan Isaev; Gulzada Esenalieva; Ermek Doszhanov

doi:10.15388/Kalbotyra.2023.76.4

Straipsniai

Ruslan Isaev

Ala-Too International University, Kyrgyz Republic

https://orcid.org/0000-0003-4426-8837

Gulzada Esenalieva

Ala-Too International University, Kyrgyz Republic

https://orcid.org/0009-0000-9135-1671

Ermek Doszhanov

Ala-Too International University, Kyrgyz Republic

https://orcid.org/0009-0002-4939-5683

Publikuota 2023-12-28

https://doi.org/10.15388/Kalbotyra.2023.76.4

PDF

HTML

Kaip cituoti

Isaev, R., Esenalieva, G. and Doszhanov, E. (2023) “Towards Kyrgyz stop words”, Kalbotyra, 76, pp. 54–65. doi:10.15388/Kalbotyra.2023.76.4.

Atsisiųsti citatą

Santrauka

The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.
The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.

PDF

HTML

Šis kūrinys yra platinamas pagal Kūrybinių bendrijų Priskyrimas 4.0 tarptautinę licenciją.

Atsisiuntimai

Nėra atsisiuntimų.