Libraries: lang-uk

Vulyk (eng. Hive) - framework for crowdsourcing data processing

The Vulyk application was initially created to help thousands of volunteers from NGO Kantseliarska Sotnia to decipher hundred of thousands of officials’ declarations. But because of its open license and simple architecture, it can be used for practically every task that requires crowdsourcing. We have successfully used Vulyk for NER annotation in BrUK corpus. The founders of Vulyk are Dmytro Chaplinskyi, Dmytro Hambal, and Volodymyr Hotsyk.

NER-annotation add-on for Vulyk

Vulyk NER add-on was created to help develop the first Ukrainian NER-annotated corpus. It is based on the popular open source product BRAT. With the aid of Vulyk integration, it is possible for new volunteers to sign-up through social networks

coherence-ua — estimation of the coherence of a text

coherence-ua is a software Python package for the estimation of the coherence of Ukrainian texts based on a neural network model (Transformer architecture). The training of the model was performed on a set of Ukrainian news. Input data of the model are represented as text. The model implements the following methods:

get_prediction_series — the estimation of the coherence for each text’s group. The term “group” implies a set of text sentences (each group incorporates 3 sentences) with an ordinary offset. For instance, <s1, s2, s3>, <s2, s3, s4>, <s3, s4, s5>, where <si> represents the separate sentence of a text.
evaluate_coherence_as_product — the estimation of the coherence of a text that is calculated as a product of the output values of all groups.
evaluate_coherence_using_threshold — the estimation of the coherence of a text that is calculated as the ratio of a number of coherent groups over a number of all groups according to a set threshold value.

The software Python package is created by Artem Kramov

noun-phrase-ua — noun phrase detection

noun-phrase-ua is a software Python package to extract noun phrases from Ukrainian texts. The key idea of the model is based on the analysis of the dependency tree of a text. The input data of a method extract_entities is a text. A result is returned as a dictionary that contains the following keys:

tokens — the list of text’s tokens with their properties.
entities — the list of the indices of tokens that form noun phrases.

The software Python package is created by Artem Kramov