Olga Sozinova

Computational Linguist, Ph.D.
Fine Artist

Curriculum Vitae

I am currently working on my PhD project at the URPP Language and Space, University of Zurich, starting from October 2018. It is a part of Tanja Samardžić's SNSF project Non-randomness in Morphological Diversity.
I have a solid background in Computational and Fundamental Linguistics. My research is inspired by possibilities of programming and maths for linguistic analysis.
Apart from NLP research, I have experience in data science, software & web development, and data visualization.
Download CV
Profile page, University of Zurich

PhD Project: Geometry of Linguistic Morphology

My PhD project is aimed to develop new methods for studying linguistic morphological diversity. In particular, I explore the tools from information theory (entropy), fractal geometry (fractal dimension) and graph theory (tree structures) in order to establish a rigorous scientific approach for comparing morphological structures cross-linguistically. An expected outcome of the project is a 1) novel method of studying subword structures language-independently; 2) potential application of the established method for the multilingual NLP models and downstream tasks.
Supervisor: Tanja Samardžić
Co-supervisor, professor in charge: Martin Volk


Andersen | GPT-2 Stories

Project management, Flask web development
Project management for the scientific fair Scientifica 2021. Web development of a text generation robot in four languages. Backend debugging and full frontend development.
2021 University of Zurich


Python package development
Participating in the Python package development, creating the architecture of classes, cleaning the code. Designing the logo.
2020 University of Zurich

Zurich Tangram Corpus

Flask website, web development
Full stack development of a multimedia corpus.
Information about the project (Completed projects, 2019)
Link to the corpus (available only from the UZH VPN)

2018—2019 University of Zurich

Russian Rhyme Database (demo)

Django website + Neo4j database, web development
Russian rhyme database is the first web resource for finding Russian rhymes with references to the actual verse lines from the Russian poetry (from the 18th century to the first third of the 20th century). Full stack development.

2016 HSE University, Moscow

Guess Bayes Factor

R Shiny application, web development
Game of guessing a Bayes factor (metric from Bayesian statistics) given a scatter plot with regression lines.
2016 University of Tübingen

HSE Thai Corpus

Crawling texts
Web crawling for the corpus of modern texts written in Thai language.
2015—2016 HSE University, Moscow

Beserman Dictionary

Frontend development
2015 HSE University, Moscow

Freaky Frequency

Research on frequency of Russian verb forms
Freaky Frequency is an information system based on the collection of Russian word forms and their frequency.
2013 HSE University, Moscow

Data Visualization

Data visualization for my talk Subword Geometry: Picturing Word Shapes at the workshop SIGTYP 2021, co-located with NAACL 2021.

2021 University of Zurich

Entropy at different BPE merges

R, ggplot & gganimate

Data visualization for the paper
Gutierrez-Vasques, X., C. Bentz, O. Sozinova and T. Samardzic (2021). From characters to words: the turning point of BPE merges. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3454—3468.

2021 University of Zurich

100LC corpus, overview

JavaScript based on amCharts
Dynamic overview of the number of tokens gathered for different languages and genres in 100LC corpus.

2020 University of Zurich

Russian Rhyme Database, results example

Dynamic visualization of the query to the Russian Rhyme Database.

2016 HSE University, Moscow, Russia

Clusters of Russian rhymes

Dynamic visualizations of the Russian rhymes' clusters. Links to the visualizations of different time periods:
18th century
19th century, 1st third
19th century, 2nd third
19th century, last third
20th century, 1st third

Related links:
Abstract for DH2016
Project description (in Russian)

2016 HSE University, Moscow, Russia





88 days

The project was inspired by the book "Draw every day" by Natali Ratkovski.
In spring–summer 2015 during 88 days I made sketches about some events happening, thoughts or dreams, music I was listening to. All in all, it resulted in 86 pictures followed by some comments.

B/w and sepia graphics

Coloured graphics