Goethe University Frankfurt
Introduction to Linguistic Linked Data
The number of resources that provide lexical data keeps increasing as outcomes of projects in (computational) linguistics, digital humanities, and e-lexicography. This vast landscape of heterogeneous and often isolated language resources creates obstacles for their straightforward linking and integration in pipelines in an interoperable manner. To address this, experts working in the domain of the Semantic Web have adopted approaches to linguistic data representation based on the Linked Data (LD) paradigm, giving birth to the Linguistic Linked Data (LLD) line of research. In this context, linked data emerges as a way to make linguistic data uniformly query-able, interoperable, and easily discoverable as well as reusable on the basis of web standards. This tutorial will provide attendees a theoretical and practical overview of the foundations of LLD, covering, among other aspects, an introduction to the Semantic Web and linked data, and a walkthrough of the different steps for linguistic linked data generation. We will lay special emphasis on knowledge representation with the de-facto standard for lexical data representation on the Web, the OntoLex-Lemon model, and other linguistic vocabularies. Participants are encouraged to select language resources of their interest in advance to address their modelling as part of the practical session.
Institute of Formal and Applied Linguistics
Charles University, Czech Republic
Faculty of Mathematics and Physics
Daniel Zeman is a senior researcher and lecturer at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University in Prague. His research interests range from natural language parsing and morphology to low-resource language processing and linguistic typology. He is one of the co-founders and main coordinators of the Universal Dependencies project.
Universal Dependencies: Principles and Tools
Universal Dependencies is an international community project and a collection of morphosyntactically annotated data sets (“treebanks”) for more than 100 languages. The collection is an invaluable resource for various linguistic studies, ranging from grammatical constructions within one language to language typology, documentation of endangered languages, and historical evolution of language. In the tutorial, I will first quickly show the main principles of UD, then I will present the actual data and various tools that are available to work with it: parsers, batch processors, search engines and viewers.