I have a solid background in Computational and Fundamental Linguistics. My research is inspired by possibilities of programming and maths for linguistic analysis.
Apart from NLP research, I have experience in data science, software & web development, and data visualization.
Profile page, University of Zurich
PhD Project: Geometry of Linguistic MorphologyMy PhD project is aimed to develop new methods for studying linguistic morphological diversity. In particular, I explore the tools from information theory (entropy), fractal geometry (fractal dimension) and graph theory (tree structures) in order to establish a rigorous scientific approach for comparing morphological structures cross-linguistically. An expected outcome of the project is a 1) novel method of studying subword structures language-independently; 2) potential application of the established method for the multilingual NLP models and downstream tasks.
Supervisor: Tanja Samardžić
Co-supervisor, professor in charge: Martin Volk
31 May 2021
29 March 2021
26 January 2021
Video presentation about my work in progress at the URPP Language and Space, University of Zurich
Project management, Flask web development
Project management for the scientific fair Scientifica 2021. Web development of a text generation robot in four languages. Backend debugging and full frontend development.
2021 University of Zurich
Python package development
Participating in the Python package development, creating the architecture of classes, cleaning the code. Designing the logo.
2020 University of Zurich
Flask website, web development
Full stack development of a multimedia corpus.
Information about the project (Completed projects, 2019)
Link to the corpus (available only from the UZH VPN)
2018—2019 University of Zurich
Django website + Neo4j database, web development
Russian rhyme database is the first web resource for finding Russian rhymes with references to the actual verse lines from the Russian poetry (from the 18th century to the first third of the 20th century). Full stack development.
2016 HSE University, Moscow
R Shiny application, web development
Game of guessing a Bayes factor (metric from Bayesian statistics) given a scatter plot with regression lines.
2016 University of Tübingen
Web crawling for the corpus of modern texts written in Thai language.
2015—2016 HSE University, Moscow
2015 HSE University, Moscow
Research group “Corpus instruments for Yiddish studies”
Link to the paper
2014 HSE University, Moscow
Research on frequency of Russian verb forms
Freaky Frequency is an information system based on the collection of Russian word forms and their frequency.
2013 HSE University, Moscow
Data visualization for the paper
Gutierrez-Vasques, X., C. Bentz, O. Sozinova and T. Samardzic (2021). From characters to words: the turning point of BPE merges.
In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3454—3468.
2021 University of Zurich
Dynamic visualization of the query to the Russian Rhyme Database.
2016 HSE University, Moscow, Russia
Dynamic visualizations of the Russian rhymes' clusters. Links to the visualizations of different time periods:
19th century, 1st third
19th century, 2nd third
19th century, last third
20th century, 1st third
The project was inspired by the book "Draw every day" by Natali Ratkovski.
In spring–summer 2015 during 88 days I made sketches about some events happening, thoughts or dreams, music I was listening to. All in all, it resulted in 86 pictures followed by some comments.