List of all 21 CLARIN K-centres with expertise in specific language resources families
Try out our NEW K-Centres catalogue, which merges this alphabetical list with advanced search functionalities on a single page.
Click on the full name of the K-centre to go to its landing page, and click on the acronym to see organisation details |
|
| |
Areas of competence | Atypical communication encompasses language and speech as encountered during (second) language acquisition and development, and in language disorders, but also more broadly in bilingual language development and in sign language. ACE is specialised in this type of research and concomitant infrastructural issues related to data acquisition, processing and sharing, which is typically highly characterised by sensitivity issues. For data storage and access the centre collaborates with MPI's TLA (The Language Archive) which is a CLARIN B Centre and also based in Nijmegen. We publish in English and Dutch but have expertise in many European languages. |
Audiences served | - linguists; - psychologists; - neuroscientists; - computer scientists; - speech and language therapists; - education specialists |
Types of services | - how-to documents; - access to document templates; - Access to data; - Depositing; - FAQ; - Helpdesk; - Technical support |
Is portal for language(s) | - |
Other languages covered | - |
Modalities covered | - Audio: speech; - Text; - Video: sign language |
Linguistic topics | - Language acquisition (L1 and L2); - language disorders; - Language learning |
Language processing | - Language technology; - speech technology; - automatic speech recognition; - LLMs for atypical language |
Data types | - |
Resource families | - Manually annotated corpora; - Multimodal corpora; - Sign language corpora; - Spoken corpora; - Lexica; - Glossaries |
Generic topics | - Critical Data Management; - Legal and ethical issues; - AI speech models for automatic speech recognition and evaluation |
Other keywords | - Language acquisition; - sign language; - language pathologies |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | The CLARIN Knowledge Centre for Computer-Mediated Communication and Social Media offers expertise on language resources and technologies for Computer-Mediated Communication and Social Media. Its basic activities are to (1) give researchers, students, and other interested parties information about the available resources, technologies, and community activities, (2) support interested parties in producing, modifying or publishing relevant resources and technologies and (3) organize training activities. |
Audiences served | - Computational linguists; - Linguists; - Language teachers; - Sociologists; - Citizen scientists |
Types of services | - Access to documentation; - FAQ; - Helpdesk; - User assistance; - User forum |
Is portal for language(s) | - |
Other languages covered | - English; - French; - German; - Italian; - Slovenian; - and their respective language families |
Modalities covered | - Text; - Multimodality |
Linguistic topics | - Morphology; - Syntax; - Semantics; - Stylistics |
Language processing | - Corpus data representation; - Basic natural language processing; - Proper data deposition |
Data types | - |
Resource families | - Computer-mediated communication corpora (social media) |
Generic topics | - Data management; - Legal issues; - Ethical issues; - Standards |
Other keywords | - Qualitative and quantitative linguistic analyses; - Processing of non-standard language (learner language and web corpora) |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | The CLARIN Knowledge Centre for Learner Corpora offers advice and training services on the collection and use of learner corpora (i.e. electronic collections of language data produced by second or foreign language learners) for theoretical and applied purposes. |
Audiences served | - Linguists; - Language teachers; - Computational linguists; - Language learners; - Citizen scientists |
Types of services | - Access to data; - Access to document templates; - Access to documentation; - Access to tools; - FAQ; - Helpdesk; - How-to documents; - Training; - Technical support; - User assistance |
Is portal for language(s) | - |
Other languages covered | - Dutch; - English; - French; - German; - Spanish |
Modalities covered | - Audio: speech; - Text |
Linguistic topics | - Applied linguistics; - Corpus linguistics; - Second language acquisition; - Computational linguistics; - Translation studies; - Language variation; - Learner corpus design; - Learner corpus compilation; - Spoken data transcription; - Manual annotation; - Learner corpus research; - Discourse; - Phraseology; - Linguistic complexity |
Language processing | - Part-of-speech tagging; - Parsing; - Measuring readabity; - Automatic annotation; - Natural language processing |
Data types | - |
Resource families | - Computer-mediated communication corpora (social media); - Corpora of academic texts; - L2 learner corpora; - Manually annotated corpora; - Newspaper corpora; - Parallel corpora; - Reference corpora; - Spoken corpora |
Generic topics | - Ethical issues; - Metadata |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | The CLARIN Knowledge-Centre for linguistic diversity and language documentation offers expertise on data and data-related methods, technology and background information on language resources and tools to researchers - including students and native speakers. CKLD provides information and assistance relating to fieldwork and data-related methodological aspects and in particular relating to equipment, digital tools, methods, where to find data and information, whom to contact for specialist information on particular regions or language families. |
Audiences served | - Linguists; - field linguists; - typologists; - language communities of endangered languages |
Types of services | - Helpdesk; - How-to documents; - Training |
Is portal for language(s) | - |
Other languages covered | - Under-researched languages and languages families (linguistic diversity).; - Athabascan; - Austronesian; - Austro-Asiatic; - Dravidian; - Finno-Ugric; - Papuan |
Modalities covered | - Audio-visual; - Text |
Linguistic topics | - language documentation; - linguistic typology; - linguistic fieldwork |
Language processing | - |
Data types | - audio-visual collections; - typological databases |
Resource families | - Manually annotated corpora; - Spoken corpora; - Dictionaries; - Wordlists |
Generic topics | - linguistic fieldwork; - endangered languages; - language typology |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | CLARIN-MULTISENS provides advice on multimodal and sensor-based methods including EEG (Electroencephalography), eye-tracking, articulography, virtual reality, motion capture, and language related data such as audiovisual recordings and textual representations |
Audiences served | - Linguists; - Behavioural and cultural scientists |
Types of services | - Depositing; - Helpdesk; - Technical support; - Training; - User assistance |
Is portal for language(s) | - |
Other languages covered | - Swedish; - English; - Austroasiatic languages |
Modalities covered | - Audio: speech; - Audio-visual; - Multimodality; - Sensor data; - Text; - Video: gestures |
Linguistic topics | - Comparative and phylogenetic linguistics; - Dialect studies; - Field linguistics; - Language diversity; - Language learning; - Language production; - Neurolinguistics; - Phonetics; - Psycholinguistics |
Language processing | - Information extraction; - Keystroke logging; - Named entity recognition; - Text mining; - Visual prosody |
Data types | - Dictionaries; - Language models; - Lexical and typological/morphosyntactic databases |
Resource families | - Historical corpora; - L2 learner corpora; - Manually annotated corpora; - Multimodal corpora; - Spoken corpora; - Oral history corpora |
Generic topics | - Multimodal and sensor-based methods; - EEG; - Eye-tracking; - Articulography; - Virtual reality; - Motion capture; - audio-visual-recording; - Working with GIS |
Other keywords | - Interdisciplinary research; - Methodological know-how; - E-science |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | Language technology and resources for Swedish, Swedish Sign Language, and multilingual settings. Expertise in the processing of parallel corpora including alignment and machine translation, pretrained language models, cross-linguistically consistent annotation within the framework of Universal Dependencies, and computation and evaluation of measures of text complexity. |
Audiences served | - Linguists; - Phoneticians; - Psycholinguists; - Historians; - Literary scientists; - Art historians; - Digital humanities researchers; - Economists; - General public |
Types of services | - Helpdesk; - Technical support; - Training; - User assistance |
Is portal for language(s) | - |
Other languages covered | - Swedish; - Swedish Sign Language; - English; - Expertise in linguistic diversity and multilingual applications |
Modalities covered | - Audio: speech; - Text; - Video: sign language |
Linguistic topics | - Computational linguistics; - Language diversity; - Field linguistics; - Language learning; - Neurolinguistics; - Phonetics; - Phonology; - Pragmatics; - Psycholinguistics; - Semantics |
Language processing | - Language processing pipelines; - Part-of-speech tagging; - Named entity recognition; - Syntactic parsing; - Machine translation; - Processing of discourse relations; - Pretrained models; - Text simplification; - Text complexity |
Data types | - Parallel corpora; - Treebanks; - Language models; - Typological databases; - Text; - Video; - Audio |
Resource families | - Newspaper corpora; - Parallel corpora; - Sign language resources; - Wordlists |
Generic topics | - Natural language processing; - Corpus annotation; - Machine learning; - Digital humanities |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | Technical advice on speech analysis relating to all aspects of speech technology, including speech science, speech applications, and speech in interaction. |
Audiences served | - Archivists; - Computational linguists; - Computer scientists; - Historians; - Language teachers; - Librarians; - Linguists; - Phoneticians; - Speech Pathologists; - Sociologists; - Sociolinguists |
Types of services | - Access to Tools; - Helpdesk; - User assistance; - Training |
Is portal for language(s) | - |
Other languages covered | - English; - Swedish |
Modalities covered | - Audio: speech; - Audio-visual data; - Multimodality; - Sensor data; - Sensor data: biosignals; - Video: gestures |
Linguistic topics | - Phonetics; - Speech pathology |
Language processing | - Speech analysis; - Speech modelling; - Speech processing; - Speech recognition; - Speech synthesis |
Data types | - Acoustic and language models; - Biosignals related to spoken interaction; - Dictionaries; - Pronunciation data; - Vocabularies |
Resource families | - Oral history corpora; - Parliamentary corpora |
Generic topics | - ASR; - Data management; - Deep learning; - Evaluation; - Legal issues; - Tools; - Visualisation |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | CLASSLA offers expertise on language resources and technologies for South Slavic languages. It provides information on freely available lexicons and corpora, which can be used in research in the social sciences and humanities. The CLASSLA-Stanza pipeline allows researchers to perform language processing of their texts to produce their own corpora, while the CLASSLA web corpora as the largest general corpora for all South Slavic languages enable direct language research. The centre provides guidance in how to use the available resources and technologies in research. |
Audiences served | - Computational linguists; - Computer scientists; - Citizen scientists; - Historians; - Language teachers; - Linguists; - Sociolinguists; - Sociologists |
Types of services | - FAQ; - Helpdesk; - Technical support; - Training |
Is portal for language(s) | - Slovenian; - Slovene; - Croatian; - Bosnian; - Serbian; - Montenegrin; - Macedonian; - Bulgarian |
Other languages covered | - |
Modalities covered | - Audio: speech; - Text |
Linguistic topics | - Applied linguistics; - Dialect studies; - Sociolinguistics |
Language processing | - Basic language processing; - Information extraction; - Language understanding; - Named entity recognition; - Processing of morphologically rich languages; - Speech recognition |
Data types | - Manually annotated datasets; - Corpora; - Language models; - Treebanks |
Resource families | - Computer-mediated communication corpora (social media); - Historical corpora; - Literary corpora; - Newspaper corpora; - Parliamentary corpora; - Corpora of academic texts; - Manually annotated corpora; - Multimodal corpora; - Parallel corpora; - Reference corpora; - Spoken corpora; - Language models; - Lexica; - Normalization; - Named entity recognition; - Part-of-speech tagging and lemmatization; - Tools for sentiment analysis |
Generic topics | - Evaluation of tools and models; - Machine learning; - Deep learning |
Other keywords | - Processing of closely related languages; - Language variation; - Spatial language variation |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | Provides information, consulting and technical assistance on all topics related to corpus linguistics. This includes data formats, annotation, corpus querying, corpus linguistics methodology, statistical methods etc. Another specialisation of the centre is empirical research on the Czech language. |
Audiences served | - Computational linguists; - Computer scientists; - Language teachers; - Linguists; - Psycholinguists; - Sociolinguists |
Types of services | - Assistance with the use of data and tools; - Consultancy; - Data processing on demand; - Helpdesk; - Technical support; - Training courses |
Is portal for language(s) | - Czech |
Other languages covered | - |
Modalities covered | - Speech; - Written text |
Linguistic topics | - Applied linguistics; - Corpus linguistics; - Diachronic language studies; - Dialect studies; - Discourse; - Language learning; - Lexical studies; - Morphology; - Syntax; - Terminology |
Language processing | - Basic language processing; - Processing of morphologically rich languages |
Data types | - |
Resource families | - Computer-mediated communication corpora (social media); - Historical corpora; - L2 learner corpora; - Literary corpora; - Manually annotated corpora; - Newspaper corpora; - Parallel corpora; - Reference corpora; - Spoken corpora |
Generic topics | - |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | The CLARIN Knowledge Centre for Croatian language (CROATINA) provides relevant knowledge about Croatian language and promotes the use of language technologies. CROATINA Helpdesk is a service offering information about the variety of topics related to Croatian language, Croatian language learning and use of language resources. CROATINA also offers help in depositing language resources in HR-CLARIN or other repositories. While the existing K-centre CLASSLA is oriented primarily towards the language resources for all South-Slavic languages, CROATINA will offer complementary language resources specific for Croatian only. Additionally, through FAQ and helpdesk CROATINA will cover relevant linguistic information about the Croatian language, its history, structure at all language levels, typological features, sociolinguistic environment, level of the language technological support, and other linguistically and technologically relevant information. Following their research interests and different language resources two K-centres offer, the users should be able to select the relevant K-centre by themselves, or they could approach CROATINA for additional guidance. |
Audiences served | - Linguists; - Computational linguists; - Language teachers; - Digital humanists; - Language technology developers; - Students; - Researchers in information and communication sciences |
Types of services | - Access to data; - Access to tools; - Depositing; - User assistance; - FAQ; - Helpdesk; - Video lectures; - Training courses |
Is portal for language(s) | - Croatian language (ISO 639-3: hrv, glottolog: croa1245) in all its varieties; e.g. including; - Kajkavian dialect (ISO 639-3: kjv; glottolog: kajk1237),; - Chakavian dialect (glottolog: chak1265),; - Burgenland Croatian dialect (glottolog: 1244),; - Croatian Molisano/Slavomolisano dialect (glottolog: slav1254), etc. |
Other languages covered | - |
Modalities covered | - Written text primarily |
Linguistic topics | - Phonology; - Morphology; - Syntax; - Semantics; - Pragmatics; - Discourse; - Corpus linguistics; - Language resources; - Language technology; - Lexicography |
Language processing | - Tokenisation; - PoS/MSD tagging; - Lemmatisation; - Named entity recognition; - Corpora creation and management; - Lexica creation and management; - Morphological processing at inflectional and derivational level; - Multimodal annotations; - Parsing (syntactic and semantic); - Language models; - Croatian language processing |
Data types | - Monolingual corpora; - Parallel corpora; - Wordnets; - Treebanks; - Language models; - Morphology databases; - Lexica |
Resource families | - Computer-mediated communication corpora (social media); - Corpora of academic texts; - Historical corpora; - L2 learner corpora; - Legal corpora; - Literary corpora; - Manually annotated corpora; - Multimodal corpora; - Newspaper corpora; - Oral history corpora; - Parallel corpora; - Reference corpora; - Spoken corpora; - Conceptual resources; - Dictionaries; - Glossaries; - Language models; - Lexica; - Wordlists; - Normalisation; - Named entity recognition; - Part-of-speech tagging and lemmatisation; - Tools for sentiment analysis |
Generic topics | - Data management; - Metadata; - Language Technology platforms and frameworks; - Grammars |
Other keywords | - Expertise in CALL for Croatian as L2 (incl. L2 corpora); - Expertise in language processing chain for Croatian based on LLM(s) (incl. POS-tagging and lemmatisation) |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | - Danish language and Danish sign language;- Danish language resources;- language technology tools for Danish;- Natural language processing methods |
Audiences served | - Linguists; - Computational linguists; - Sociolinguists; - Language and literature researchers; - Citizen scientists |
Types of services | - Helpdesk; - How-to documents |
Is portal for language(s) | - Danish |
Other languages covered | - English; - Danish sign language |
Modalities covered | - Text; - Video: gestures; - Video: sign language |
Linguistic topics | - Morphology; - Syntax; - Semantics; - Pragmatics; - Lexicography; - Multimodality |
Language processing | - Tokenisation; - PoS tagging; - Lemmatisation; - Named entity tagging; - Parsing; - TEI annotation; - Corpus tools; - Natural language processing tools |
Data types | - Text corpora; - Wordnets; - Lexica; - Multimodal annotations |
Resource families | - Historical corpora; - Literary corpora; - Parliamentary corpora; - Lexica |
Generic topics | - Natural language processing; - Basic processing/annotation of corpora; - Data management; - Standards |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | Diachrionic text collections, historical texts, and tools and resources for processing and analysing them |
Audiences served | - researchers in the humanities, with an interest in different aspects of historical texts; - historians; - social scientists; - researchers in literature and history of ideas; - historical linguists; - computational linguists; - researchers working in the field of digital humanities |
Types of services | - Access to tools; - Technical support; - Web-hosting |
Is portal for language(s) | - Swedish |
Other languages covered | - |
Modalities covered | - Text |
Linguistic topics | - Diachronic language studies |
Language processing | - Diachronic language processing |
Data types | - |
Resource families | - Normalisation |
Generic topics | - Diachronic corpora; - Natural language processing for historical text; - spelling normalisation; - digital humanities |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | Digital Resources for the Languages in Ireland and Britain provides advice and support to researchers and others who want to find and use software programmes and digital datasets in the native languages of Britain and Ireland, in all their varieties, in contemporary and historic forms, as well as other languages as they are used in this region. The knowledge centre is virtual and distributed, with a central online presence and contact point at https://www.clarin.ac.uk/dr-lib. Information to orient and help users is posted online, and queries will be responded to by a network of experts centred around the CLARIN-UK consortium, plus additional experts in key languages and domains, such as the Irish National Corpus project (see https://www.gaois.ie/en/about/info), and experts across Europe in the CLARIN network. ;The centre aims to be a source of authoritative answers for questions like "Is there a semantic tagger for Welsh?", "How do I do OCR for Scottish Gaelic?", "Is there a corpus of Irish-language social media posts?", and "Is there an online dictionary for Old English?" |
Audiences served | - Computational linguists; - Linguists; - Historical linguists; - Literary scholars; - Language teachers; - Historians; - Librarians; - Language activists; - Citizen scientists |
Types of services | - Advice; - Access to datasets (via OTA and VLO); - Access to tools; - Depositing; - FAQ; - Helpdesk; - Technical support; - User assistance |
Is portal for language(s) | - English; - Irish; - Scottish Gaelic; - Welsh; - Scots |
Other languages covered | - Non-native minority languages in Ireland and Britain; - Cornish |
Modalities covered | - Speech; - Text; - Computer-mediated communication |
Linguistic topics | - Lexis; - Syntax; - Usage; - Dialectology |
Language processing | - Corpus building; - Linguistic annotation; - Named entity recognition; - Treebanking; - Automatic Speech Recognition (speech to text) |
Data types | - Corpora; - Lexical resources; - Spoken datasets |
Resource families | - Computer-mediated communication corpora (social media); - Corpora of academic texts; - Historical corpora; - L2 learner corpora; - Legal corpora; - Literary corpora; - Manually annotated corpora; - Multimodal corpora; - Newspaper corpora; - Oral history corpora; - Parallel corpora; - Parliamentary corpora; - Reference corpora; - Spoken corpora; - Conceptual resources; - Dictionaries; - Glossaries; - Language models; - Lexica; - Wordlists; - Normalization; - Named entity recognition; - Part-of-speech tagging and lemmatization; - Tools for sentiment analysis; - Corpus query tools |
Generic topics | - |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | IMPACT-CKC (IMPACT centre of competence - CLARIN K-centre in digitisation), as knowledge centre offers expertise and resources to institutions and researchers looking for advice in digitisation and related fields. The IMPACT-CKC resources include a collection of high quality images with associated ground truth, historical lexica for 10 languages, training materials and registries on tools, initiatives, datasets and competitions relevant to digitisation and related fields. |
Audiences served | - researchers; - librarians; - archivists; - digital humanists; - computer scientists in topics related to digitisation |
Types of services | - Access to data; - Access to tools; - Training; - User assistance |
Is portal for language(s) | - |
Other languages covered | - Spanish; - English; - Polish; - French; - Dutch; - German; - Slovene; - Czech; - Latin; - Bulgarian |
Modalities covered | - Images; - Multimodality; - Text |
Linguistic topics | - corpus linguistics; - diachronic language resources; - language learning |
Language processing | - basic language processing; - information extraction |
Data types | - lexical data; - language models; - linked open data; - ontologies |
Resource families | - Historical corpora; - Literary corpora; - Manually annotated corpora; - Multimodal corpora; - Newspaper corpora; - Lexica; - Glossaries; - Normalisation; - Named entity recognition; - Part-of-speech tagging and lemmatisation |
Generic topics | - OCR; - digitisation; - visualisation; - evaluation of tools |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | K-Dutch is the place for researchers who want to find out more about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects and many other things. K-Dutch is hosted by the Instituut voor de Nederlandse Taal (Dutch Language Institute), which is also a CLARIN-B centre and host of many tools, lexica and corpora for Dutch, which are, in general, freely available. |
Audiences served | - Computational linguists; - Linguists; - Language teachers; - Historians; - Library staff; - Sociologists; - Students; - Citizen scientists |
Types of services | - Access to data; - Access to documentation; - Access to tools; - Depositing; - FAQ; - Helpdesk; - Training; - User assistance |
Is portal for language(s) | - Dutch |
Other languages covered | - Frisian; - Afrikaans; - Flemish sign language; - Dutch sign language (Sign language of the Netherlands) |
Modalities covered | - Audio: speech; - Text; - Video: sign language; - Audio-visual |
Linguistic topics | - Morphology; - Syntax; - Semantics; - Language learning; - Translation studies; - Diachronic language studies; - Phonology; - Terminology; - Dialectology; - Lexicography; - Natural Language Processing |
Language processing | - Basic language processing (spell check, pos-tagging, lemmatisation); - Deep parsing; - Information extraction; - Machine translation; - Processing of historical variants of Dutch; - Speech recognition; - Speech synthesis; - Text mining; - Terminology extraction; - Corpus querying; - Treebank querying |
Data types | - Language models; - Dictionaries / Lexica; - Treebanks; - Wordnets; - Linked open data; - Ontologies; - Termbanks |
Resource families | - Computer-mediated communication corpora (social media); - Corpora of academic texts; - Historical corpora; - L2 learner corpora; - Literary corpora; - Manually annotated corpora; - Multimodal corpora; - Newspaper corpora; - Parallel corpora; - Parliamentary corpora; - Reference corpora; - Spoken corpora; - Lexica; - Dictionaries; - Conceptual resources; - Wordlists |
Generic topics | - Artificial intelligence; - Natural language processing; - Machine learning; - Data mining; - Lexicography; - Linked data |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | Information service offering information about the variety of topics related to the Icelandic language. We offer information and advice about the Icelandic language, the use of digital language resources and tools for the Icelandic language (both text and speech) as well as language policy and plan. |
Audiences served | - Computational linguists; - Linguists; - Language teachers; - Language learners; - Sociologists; - Citizen scientists |
Types of services | - Helpdesk; - Access to data; - Access to tools; - Depositing; - User assistance |
Is portal for language(s) | - Icelandic |
Other languages covered | - |
Modalities covered | - Text |
Linguistic topics | - Computational linguistics; - Corpus linguistics; - Dialect studies; - Discourse; - Language learning; - Language resources; - Language technology; - Lexicography; - Morphology; - Phonology; - Pragmatics; - Semantics; - Sociolinguistics; - Syntax |
Language processing | - Basic language processing (Pos-tagging, lemmatization, parsing …); - Named entity recognition; - Machine translation; - Processing of morphologically rich languages |
Data types | - Language models; - Dictionaries; - Treebanks |
Resource families | - Legal corpora; - Literary corpora; - Newspaper corpora; - Parallel corpora; - Parliamentary corpora; - Dictionaries; - Language models; - Lexica; - Wordlists; - Named entity recognition; - Part-of-speech tagging and lemmatization |
Generic topics | - Natural language processing; - Corpus annotation |
Other keywords | - |
Tour de CLARIN | - - |
|
|
| |
Areas of competence | CLARIN K-Centre NLP:EL is an information service offering expertise and advice on (a) Language Technology for Greek, (b) on the issue of digital readiness of Greek (i.e. how ready is Greek for the digital age, as regards digital resources and language processing tools), and (c) on Greek Sign Language. It provides guidance for the development and annotation of language resources, for the findability, accessibility and use of existing language processing tools and web services, for the development of new technologies for language processing, as well as useful information and support regarding the fields of dynamic sign language synthesis, and special communication and interaction interfaces.;NLP:EL provides its services through two channels: it operates a helpdesk where users can address questions concerning the above issues. Besides responding to questions on these topics, it also provides informative material and documentation relevant to these issues; this material includes (but is not limited to) (i) scientific publications and presentations on Natural Language Processing (NLP) research and applications for Greek and Greek sign language, (ii) guides and tutorials on language processing tools and services for Greek and Greek sign language, (iii) direct connection to the specialized Sign Language Technologies website and (iv) direct connection to the CLARIN:EL infrastructure, where the users can find a catalogue of digital language resources and language processing tools, more detailed information and further training and dissemination material. |
Audiences served | - Computational linguists; - Language Technology developers; - Language teachers; - Linguists; - Philologists; - Sign language community; - Political scientists; - Social Scientists; - Digital humanists |
Types of services | - Access to documentation; - Helpdesk; - How-to documents; - Training |
Is portal for language(s) | - Greek; - Greek sign language |
Other languages covered | - |
Modalities covered | - Text; - Video: gestures; - Video: sign language |
Linguistic topics | - Lexicography; - Morphology; - Syntax; - Terminology; - Corpus linguistics |
Language processing | - Language processing; - Processing of morphologically rich languages; - Information extraction; - Named entity recognition; - Text mining; - Machine translation; - Sign language technologies; - Speech recognition; - Speech synthesis |
Data types | - Translation memories; - Term banks; - Treebanks; - Lexical conceptual resources; - Dictionaries; - Ontologies; - Thesauri; - Language models |
Resource families | - Historical corpora; - Computer-mediated communication corpora (social media); - Newspaper corpora; - Parallel corpora; - Parliamentary corpora |
Generic topics | - |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | As an audio and audio-visual archive with numerous collections of unique research recordings from all across the world, covering a time-span of 125 years, the Phonogrammarchiv offers various services: Besides providing access to its rich data and metadata resources (online, remote & onsite), it advises scholars on audio-visual research methodology in the social sciences & humanities and on technologies of audio and audio-visual documentation, including the loan of recording equipment. In addition, it widely shares its broad expertise on topics such as restoration, digitisation, format obsolescence, cataloguing, metadata, long-term preservation and storage as well as legal and ethical issues. |
Audiences served | - scholars; - source communities; - linguists; - ethnomusicologists; - social / cultural anthropologists; - historians; - archivists; - audio-visual conservators; - museums; - media; - artists; - teachers |
Types of services | - Access to data; - Access to documentation; - Depositing; - Technical support; - Training; - User assistance |
Is portal for language(s) | - |
Other languages covered | - languages / dialects worldwide |
Modalities covered | - Audio: speech; - Audio-visual |
Linguistic topics | - field linguistics; - dialect studies; - corpus linguistics; - language documentation; - oral history |
Language processing | - |
Data types | - audio data; - audio-visual data |
Resource families | - Spoken corpora |
Generic topics | - physical restoration of audio-visual media; - digitisation of audio-visual media; - format migration of audio-visual media; - metadata of audio-visual media; - long-term preservation & storage of audio-visual data; - audio-visual fieldwork & documentation; - legal issues; - ethical issues; - data management |
Other keywords | - ethnomusicology; - musicology; - linguistics; - field linguistics; - social / cultural anthropology; - history; - African studies; - conservation; - postcolonial studies; - sound studies; - critical archive studies |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | Provides wide knowledge on the methods of natural language analysis with a special emphasis put on the analysis of Polish language. Offers support for all types of applications of Language Technology for Polish, both mono and multilingual ones. |
Audiences served | - linguists; - computational linguists; - economists; - sociologists; - psychologists; - media researchers; - researchers of communication; - literature researchers |
Types of services | - Helpdesk; - Technical support; - Training |
Is portal for language(s) | - Polish |
Other languages covered | - English; - German; - Russian; - Ukrainian; - Bulgarian; - Lithuanian; - French; - Spanish; - Hungarian; - Hebrew |
Modalities covered | - Audio: speech; - Text |
Linguistic topics | - semantics; - morphology; - syntax; - phonetics; - discourse analysis; - stylistics; - phraseology; - lexicography; - terminology; - translation studies |
Language processing | - Polish language processing; - topic modeling; - stylometry; - speech recognition; - named entity recognition; - corpora creation and management; - parallel corpora; - wordnets; - text mining; - information extraction; - word sense disambiguation |
Data types | - corpora; - dictionaries; - records of speech; - language models; - treebanks; - wordnets |
Resource families | - Literary corpora; - Newspaper corpora; - Parliamentary corpora; - Parallel corpora; - Spoken corpora |
Generic topics | - Data management; - Legal issues; - Machine learning; - Metadata; - OCR; - Standards; - Language normalisation; - Data acquisition; - Support in preparing grants' proposals |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | The Science and Technology of the Portuguese Language is the thematic area of this CLARIN Knowledge Centre. Related to the Portuguese language, it covers all topics, from Phonetics to Discourse and Dialogue, considering all language functions, from communicative performance to cultural expression, approached by all disciplines, from Theoretical Linguistics to Language Technology, covering all language variants, from national standard varieties across the world to dialects of professional groups, taking into account all media of representation, from audio to brain imageology recordings. |
Audiences served | - Researchers; - Innovators; - Citizen scientists; - Students; - Language professionals; - Users in general whose activities resort to research results from the Science and Technology of Language |
Types of services | - Access to data; - Access to tools; - Helpdesk; - Technical support |
Is portal for language(s) | - Portuguese |
Other languages covered | - Mirandese |
Modalities covered | - Audio: speech; - Audio-visual; - Multimodality; - Sensor data: biosignals; - Text; - Video: sign language |
Linguistic topics | - Anthropological Linguistics; - Applied Linguistics; - Clinical Linguistics; - Cognitive Science; - Computational Linguistics; - Discipline of Linguistics; - Discourse Analysis; - Forensic Linguistics; - General Linguistics; - Genetic Classification; - Historical Linguistics; - History of Linguistics; - Language Acquisition; - Language Documentation; - Lexicography; - Linguistic Theories; - Morphology; - Neurolinguistics; - Philosophy of Language; - Phonetics; - Phonology; - Pragmatics; - Psycholinguistics; - Semantics; - Sociolinguistics; - Syntax; - Text/Corpus Linguistics; - Translation; - Typology; - Writing Systems |
Language processing | - Language understanding; - Language generation; - Speech recognition and transcription; - Speech synthesis; - Muliti-modal processing; - Information extraction; - Text mining; - Conversational interfaces and chatbots; - Machine translation; - Summarisation; - Question answering; - Subtitling; - Tokenisation; - POS tagging; - Named entity recognition; - Word sense disambiguation; - Syntactic analysis; - Semantic analysis; - Anaphora resolution; - Dialogue processing; - Speaker detection |
Data types | - Corpora, written, spoken and multi-modal; - Word embeddings; - Language models; - Dictionaries; - Ontologies; - Term banks; - Translation memories; - Treebanks; - Typological databases; - Wordnets; - Lexica; - Conceptual Resources; - Glossaries; - Wordlists; - Speech databases; - Multi-modal databases |
Resource families | - Computer-mediated communication corpora (social media); - Corpora of academic texts; - Historical corpora; - L2 learner corpora; - Literary corpora; - Manually annotated corpora; - Multimodal corpora; - Newspaper corpora; - Parallel corpora; - Parliamentary corpora; - Reference corpora; - Spoken corpora |
Generic topics | - Data management; - Ethical issues; - Evaluation of tools; - Language use in specific domains (e.g. legal or medical language); - Legal issues; - Metadata; - Standards; - Visualisation |
Other keywords | - |
Tour de CLARIN | Introduction Interview |
|
|
| |
Areas of competence | Information service offering advice on the use of digital language resources and tools for the Romanian language and Romanian dialects, as well as other parts of the intangible cultural heritage of Romanian in textual data (e.g., news articles, literary works, etc.). |
Audiences served | - Computational Linguists; - Natural Language Processing practitioners; - Natural Language Generation practitioners; - Natural Language Understanding practitioners; - Natural Language Translation practitioners |
Types of services | - Access to data; - Access to tools; - Data processing models; - FAQ; - Technical support / user assistance; - Hosting researchers for visits to the K-Centre |
Is portal for language(s) | - Romanian |
Other languages covered | - English |
Modalities covered | - Text |
Linguistic topics | - Morphology; - Syntax; - Semantics; - Diachronic language studies; - Romanian texts annotations |
Language processing | - Basic language processing; - Text preprocessing and cleaning; - Information extraction; - Text mining; - Natural Language Processing; - Natural Language Understanding; - Natural Language Generation; - Sentiment Analysis; - Aspect-Based Sentiment Analysis; - Topic Modeling; - Diachronic, Semantic Change, and Semantic Shift |
Data types | - Language models; - Large language Models; - Wordnets; - Linked open data; - Ontologies |
Resource families | - Corpora of academic texts; - Historical corpora; - Legal corpora; - Literary corpora; - Manually annotated corpora; - Newspaper corpora; - Parliamentary corpora; - Language models; - Wordlists; - Normalization; - Named entity recognition; - Part-of-speech tagging and lemmatization; - Tools for sentiment analysis |
Generic topics | - Textual data management; - Machine Learning; - Deep Learning; - Big Data Analysis; - Visualization; - Natural Language Processing; - Sentiment Analysis |
Other keywords | - Parallel and Distributed Processing |
Tour de CLARIN | - - |
|
|
|
|
|
|