User profiles for Marta Villegas

Marta Villegas

Barcelona Supercomputing Center
Verified email at bsc.es
Cited by 1750

Maria: Spanish language models

…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents MarIA, a family of Spanish language models and associated resources
made available to the industry and the research community. Currently, MarIA includes …

SIMPLE: A general framework for the development of multilingual lexicons

…, I Peters, W Peters, N Ruimy, M Villegas… - International Journal …, 2000 - academic.oup.com
The project LE-SIMPLE is an innovative attempt at building harmonized syntacticsemantic
lexicons for twelve European languages, intended for use in different Human Language …

Cross-lingual text categorization

N Bel, CHA Koster, M Villegas - … 2003 Trondheim, Norway, August 17-22 …, 2003 - Springer
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which
arises when documents in different languages must be classified according to the same …

Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track

…, A Intxaurrondo, O Rabal, M Villegas… - Proceedings of The …, 2019 - aclanthology.org
One of the biomedical entity types of relevance for medicine or biosciences are chemical
compounds and drugs. The correct detection these entities is critical for other text mining …

Pretrained biomedical language models for clinical NLP in Spanish

…, A Gonzalez-Agirre, M Villegas - Proceedings of the …, 2022 - aclanthology.org
This work presents the first large-scale biomedical Spanish language models trained from
scratch, using large biomedical corpora consisting of a total of 1.1 B tokens and an EHR …

[PDF][PDF] Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.

…, A Intxaurrondo, H Rodriguez, JL Martin, M Villegas… - IberLEF …, 2019 - ceur-ws.org
There is an increasing interest in exploiting the content of electronic health records by
means of natural language processing and text-mining technologies, as they can result in …

Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario

…, M Pàmies, A Gonzalez-Agirre, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents biomedical and clinical language models for Spanish by experimenting
with different pretraining choices, such as masking at word and subword level, varying the …

Medical word embeddings for Spanish: Development and evaluation

F Soares, M Villegas, A Gonzalez-Agirre… - Proceedings of the …, 2019 - aclanthology.org
Word embeddings are representations of words in a dense vector space. Although they are
not recent phenomena in Natural Language Processing (NLP), they have gained momentum …

Are multilingual models the best choice for moderately under-resourced languages? A comprehensive assessment for Catalan

…, A Gonzalez-Agirre, M Melero, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org
Multilingual language models have been a crucial breakthrough as they considerably reduce
the need of data for under-resourced languages. Nevertheless, the superiority of language-…

[PDF][PDF] Spanish language models

…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - academia.edu
This paper presents the Spanish RoBERTa-base and RoBERTa-large models, as well as
the corresponding performance evaluations. Both models were pre-trained using the largest …