Sentiment Dictionary for Ukrainian
The dictionary contains 3442 Ukrainian words that have non-neutral sentiment (-2, -1, 1, 2).
Sources of data:
- file tone-dict-uk-manual.tsv created with data smoothing performed by a couple of experts
- file tone-dict-uk-auto.tsv automatically generated by expanding tone-dict-uk-manual dictionary with the help of word2vec and lex2vecML model and some post-processing
Data format (each column is tab-separated):
- word
- sentiment (ranging from: -2, -1, 0, 1, 2)
All the words in the dictionary are transformed to their basic grammatical form; all adverbs are replaced with the corresponding stem adjectives
The expert assessment has been provided by Oleksandr Marikovskyi and Viacheslav Tychonov
The extended dictionary was compiled by Serhiy Shehovtsov, Oles Petriv, Dmytro Chaplynskyi, Vsevolod Diomkin
Dictionary of word stresses in the Ukrainian language
This dictionary lists word stresses for 2,770,680 word forms in the Ukrainian language.
We use the COMBINING ACUTE ACCENT (U+0301) sign to denote the stress. This sign comes after the stressed vowel. For example, the following characters form the word ма́ма:
>>> chars = ['м', 'а', '\u0301', 'м', 'а']
>>> print("".join(chars))
ма́ма
Words that have multiple valid options of stressed syllables have multiple accent signs in them (по́ми́лка).
This dictionary was generated by Oleksiy Syvokon. Based on "Dictionaries of Ukraine" by ULIF.
Dictionary of heteronyms in the Ukrainian language
This dictionary contains words that have different pronunciations and meanings but the same spelling (heteronyms)
Sometimes this happens when words have completely different meanings:
- а́тлас - збірник карт
- атла́с - тканина
But the majority of heteronyms are words that exhibit stress alternation when used in different forms (singular/plural, case). For example:
- блохи́ - родовий відмінок в однині ("немає ані блохи́")
- бло́хи - множина називного відмінку ("повсюди були бло́хи")
File format
Each heteronym group takes one line. The format of one line is as follows:
headword [TAB] heteronym1,heteronym2
headword is a word without the accent sign, as it is usually spelled in writing. heteronym1, heteronym2 showcase the different pronunciations. There might be more than just two. The stressed vowel in these words is followed by the Unicode symbol COMBINING ACUTE ACCENT.
Here's Python code that parses the dictionary:
dictionary = {}
with open("heteronyms.tsv") as f:
for line in f:
line = line.rstrip("\n")
headword, heteronyms = line.split("\t")
dictionary[headword] = heteronyms.split(",")
print(dictionary["пташки"])
# Out: ['пташки́', 'пта́шки']
Source
This dictionary was generated by Олексій Сивоконь. Based on "Dictionaries of Ukraine" by ULIF.