User profiles for Marta Villegas
Marta VillegasBarcelona Supercomputing Center Verified email at bsc.es Cited by 1750 |
Maria: Spanish language models
…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents MarIA, a family of Spanish language models and associated resources
made available to the industry and the research community. Currently, MarIA includes …
made available to the industry and the research community. Currently, MarIA includes …
SIMPLE: A general framework for the development of multilingual lexicons
…, I Peters, W Peters, N Ruimy, M Villegas… - International Journal …, 2000 - academic.oup.com
The project LE-SIMPLE is an innovative attempt at building harmonized syntacticsemantic
lexicons for twelve European languages, intended for use in different Human Language …
lexicons for twelve European languages, intended for use in different Human Language …
Cross-lingual text categorization
N Bel, CHA Koster, M Villegas - … 2003 Trondheim, Norway, August 17-22 …, 2003 - Springer
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which
arises when documents in different languages must be classified according to the same …
arises when documents in different languages must be classified according to the same …
Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track
…, A Intxaurrondo, O Rabal, M Villegas… - Proceedings of The …, 2019 - aclanthology.org
One of the biomedical entity types of relevance for medicine or biosciences are chemical
compounds and drugs. The correct detection these entities is critical for other text mining …
compounds and drugs. The correct detection these entities is critical for other text mining …
Pretrained biomedical language models for clinical NLP in Spanish
…, A Gonzalez-Agirre, M Villegas - Proceedings of the …, 2022 - aclanthology.org
This work presents the first large-scale biomedical Spanish language models trained from
scratch, using large biomedical corpora consisting of a total of 1.1 B tokens and an EHR …
scratch, using large biomedical corpora consisting of a total of 1.1 B tokens and an EHR …
[PDF][PDF] Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.
There is an increasing interest in exploiting the content of electronic health records by
means of natural language processing and text-mining technologies, as they can result in …
means of natural language processing and text-mining technologies, as they can result in …
Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario
This work presents biomedical and clinical language models for Spanish by experimenting
with different pretraining choices, such as masking at word and subword level, varying the …
with different pretraining choices, such as masking at word and subword level, varying the …
Medical word embeddings for Spanish: Development and evaluation
Word embeddings are representations of words in a dense vector space. Although they are
not recent phenomena in Natural Language Processing (NLP), they have gained momentum …
not recent phenomena in Natural Language Processing (NLP), they have gained momentum …
Are multilingual models the best choice for moderately under-resourced languages? A comprehensive assessment for Catalan
Multilingual language models have been a crucial breakthrough as they considerably reduce
the need of data for under-resourced languages. Nevertheless, the superiority of language-…
the need of data for under-resourced languages. Nevertheless, the superiority of language-…
[PDF][PDF] Spanish language models
…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - academia.edu
This paper presents the Spanish RoBERTa-base and RoBERTa-large models, as well as
the corresponding performance evaluations. Both models were pre-trained using the largest …
the corresponding performance evaluations. Both models were pre-trained using the largest …