Preskoči na vsebino

Resources and tools

Carniolan Provincial Assembly corpus Kranjska 1.0

Kranjska 1.0 corpus includes stenographic transcripts of the sessions of the Carniolan Provincial Assembly (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des krainischen Landtages) from 1861 to 1913, 11 terms in total. It was created based on PDF documents prepared through scanning and optical character recognition (OCR) of the printed transcripts, which were published in the Digital Library of Slovenia dLib.si and on the SIstory portal. The corpus consists of facsimiles of the transcripts in PDF format and corresponding machine-readable XML documents in the Parla-CLARIN TEI format with metadata annotations, including morphosyntactic tagging and lemmatization. The corpus contains 694 session transcripts (a total of 15,353 pages) and parliamentary speeches with over 10 million words. The transcripts are mostly bilingual, with the language of each speech depending on the speaker: approximately 58% of the sentences are in Slovene and 42% in German. The German text was first typeset in the Gothic script and later on in Latin. The corpus is available in the CLARIN.SI repository, with links to the concordancers noSketch Engine and KonText, and in the web application ParlaVis.