Publications
We describe the on-going scientific work that is related to uralicNLP in various publications. As applying natural language processing into real language data is complex and often connects into different pipelines, our studies also attempt to solve different loosely related problems in various parts of these workflows.
This page contains only the publications that have resulted in publication available data or code.
Non-Standard Data
Finnish Dialect Normalization
Partanen, N., Hämäläinen, M., & Alnajjar, K. (2019). Dialect Text Normalization to Normative Standard Finnish. In The Fifth Workshop on Noisy User-generated Text (W-NUT 2019): Proceedings of the Workshop (pp. 141–146).
Finnish Dialect ADAPTATION
Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In Proceedings of the 11th International Conference on Computational Creativity. p. 204-211
Historical English Normalization
Hämäläinen, M., Säily, T., Rueter, J., Tiedemann, J., & Mäkelä, E. (2019). Revisiting NMT for normalization of early English letters. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (pp. 71–75).
Unsupervised OCR post correction
Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In Proceedings of Recent Advances in Natural Language Processing (pp. 432-437).
[code] [English model]
Swedish normalization
Hämäläinen, M., Partanen, N., & Alnajjar, K. (2020). Normalization of Different Swedish Dialects Spoken in Finland. In GeoHumanities’20: Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities (pp. 24–27). ACM.
Endangered Languages
Skolt Sami
Rueter, J., & Hämäläinen, M. (2020). FST Morphology for the Endangered Skolt Sami Language. In Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) (pp. 250-257).
Hämäläinen, M., & Rueter, J. (2019). Finding Sami Cognates with a Character-Based NMT Approach. In Proceedings of the 3rd Workshop on Computational Methods in the Study of Endangered Languages: (Volume 1) Papers (pp. 39-45).
[data]
Online dictionaries
Hämäläinen, M., & Rueter, J. (2018). Advances in synchronized XML-MediaWiki dictionary development in the context of endangered Uralic languages. In Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts (pp. 967-978).
UralicNLP
Hämäläinen, M. (2019). UralicNLP: An NLP Library for Uralic Languages. Journal of open source software, 4(37), [1345]
[code]
Semantics
Hämäläinen, M. (2018). Extracting a Semantic Database with Syntactic Relations for Finnish to Boost Resources for Endangered Uralic Languages. In The Proceedings of Logic and Engineering of Natural Language Semantics 15 (LENLS15) [9]
Treebanks
Partanen, N., Blokland, R., Lim, K., Poibeau, T., & Rießler, M. (2018). The First Komi-Zyrian Universal Dependencies Treebanks. In Second Workshop on Universal Dependencies (UDW 2018) (pp. 126-132).
[data – written] [data – spoken]
Rueter, J., Partanen, N., & Ponomareva, L. (2020). On the questions in developing computational infrastructure for Komi-Permyak. In Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages (pp. 15-25).
[data]
Rueter, J., & Tyers, F. (2018). Towards an open-source universal-dependency treebank for Erzya. In International Workshop for Computational Linguistics of Uralic Languages.
[data]
Speech Recognition for Samoyedic Languages
Partanen, N., Hämäläinen, M., & Klooster, T. (2020). Speech Recognition for Endangered and Extinct Samoyedic languages. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
[data]
Computational creativity aNd Figurative Language
Humor Generation
Hämäläinen, M., & Alnajjar, K. (2019). Modelling the Socialization of Creative Agents in a Master-Apprentice Setting: The Case of Movie Title Puns. In Proceedings of the 10th International Conference on Computational Creativity (pp. 266-273).
[data]
Poem Generation
Hämäläinen, M., & Alnajjar, K. (2019). Let’s FACE it: Finnish Poetry Generation with Aesthetics and Framing. In 12th International Conference on Natural Language Generation: Proceedings of the Conference (pp. 290-300)
Dialog
Alnajjar, K., & Hämäläinen, M. (2019). A Creative Dialog Generator for Fallout 4. In Proceedings of the 14th International Conference on the Foundations of Digital Games [48] New York: ACM.
[code]
Natural Language Generation
Hämäläinen, M., & Rueter, J. (2018). Development of an Open Source Natural Language Generation Tool for Finnish. In Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages (pp. 51-58).
[code] [data – verb complements] [data – locative]
Sarcasm
Hämäläinen, M. K. (2016). Reconocimiento automático del sarcasmo: ¡Esto va a funcionar bien!. University of Helsinki (Master’s thesis)
[data]
Knowledge bases
Alnajjar, K., Hämäläinen, M., Chen, H., & Toivonen, H. (2017). Expanding and Weighting Stereotypical Properties of Human Characters for Linguistic Creativity. In Proceedings of the 8th International Conference on Computational Creativity (ICCC’17) (pp. 25-32).
[data]
Prosody in Poetry
Hämäläinen, M., & Rueter, J. (2020). Runonlausunnan prosodia ja sen mallintaminen koneellisesti puhesynteesillä. In Материалы Международного образовательного салона (pp. 5-17). Ижевск: Институт компьютерных исследований.
[data]