Frequency of lowercase and uppercase letters, bigrams, and trigrams in the Serbian language
Abstract
This study presents a comprehensive analysis of letter, bigram, and trigram frequencies in the Serbian language using the Cyrillic script. Utilising a corpus of approximately 4 million characters from various literary works, newspapers, and an online encyclopedia, we calculated the frequencies of uppercase and lowercase letters, as well as bigrams and trigrams. Our findings reveal distinct patterns in the Serbian language, including the prevalence of certain letters and letter combinations. These results largely align with previous studies on Serbian and Croatian languages, with some variations due to dialectal differences. This research contributes valuable data for applications in cryptography, natural language processing, and linguistic studies specific to the Serbian language.