Google Scholar

User profiles for Marta Villegas

Marta Villegas

Barcelona Supercomputing Center

Verified email at bsc.es

Cited by 1750

[PDF] arxiv.org

Maria: Spanish language models

…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org

This work presents MarIA, a family of Spanish language models and associated resources
made available to the industry and the research community. Currently, MarIA includes …

Save Cite Cited by 180 Related articles All 10 versions View as HTML

[PDF] researchgate.net

SIMPLE: A general framework for the development of multilingual lexicons

…, I Peters, W Peters, N Ruimy, M Villegas… - International Journal …, 2000 - academic.oup.com

The project LE-SIMPLE is an innovative attempt at building harmonized syntacticsemantic
lexicons for twelve European languages, intended for use in different Human Language …

Save Cite Cited by 265 Related articles All 7 versions

[PDF] psu.edu

Cross-lingual text categorization

N Bel, CHA Koster, M Villegas - … 2003 Trondheim, Norway, August 17-22 …, 2003 - Springer

This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which
arises when documents in different languages must be classified according to the same …

Save Cite Cited by 198 Related articles All 13 versions

[PDF] aclanthology.org

Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track

…, A Intxaurrondo, O Rabal, M Villegas… - Proceedings of The …, 2019 - aclanthology.org

One of the biomedical entity types of relevance for medicine or biosciences are chemical
compounds and drugs. The correct detection these entities is critical for other text mining …

Save Cite Cited by 104 Related articles All 5 versions View as HTML

[PDF] aclanthology.org

Pretrained biomedical language models for clinical NLP in Spanish

…, A Gonzalez-Agirre, M Villegas - Proceedings of the …, 2022 - aclanthology.org

This work presents the first large-scale biomedical Spanish language models trained from
scratch, using large biomedical corpora consisting of a total of 1.1 B tokens and an EHR …

Save Cite Cited by 44 Related articles All 6 versions View as HTML

[PDF] ceur-ws.org

[PDF][PDF] Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.

…, A Intxaurrondo, H Rodriguez, JL Martin, M Villegas… - IberLEF …, 2019 - ceur-ws.org

There is an increasing interest in exploiting the content of electronic health records by
means of natural language processing and text-mining technologies, as they can result in …

Save Cite Cited by 99 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario

…, M Pàmies, A Gonzalez-Agirre, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org

This work presents biomedical and clinical language models for Spanish by experimenting
with different pretraining choices, such as masking at word and subword level, varying the …

Save Cite Cited by 40 Related articles All 4 versions View as HTML

[PDF] aclanthology.org

Medical word embeddings for Spanish: Development and evaluation

F Soares, M Villegas, A Gonzalez-Agirre… - Proceedings of the …, 2019 - aclanthology.org

Word embeddings are representations of words in a dense vector space. Although they are
not recent phenomena in Natural Language Processing (NLP), they have gained momentum …

Save Cite Cited by 64 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Are multilingual models the best choice for moderately under-resourced languages? A comprehensive assessment for Catalan

…, A Gonzalez-Agirre, M Melero, M Villegas - arXiv preprint arXiv …, 2021 - arxiv.org

Multilingual language models have been a crucial breakthrough as they considerably reduce
the need of data for under-resourced languages. Nevertheless, the superiority of language-…

Save Cite Cited by 33 Related articles All 10 versions View as HTML

[PDF] academia.edu

[PDF][PDF] Spanish language models

…, C Rodriguez-Penagos, M Villegas - arXiv preprint arXiv …, 2021 - academia.edu

This paper presents the Spanish RoBERTa-base and RoBERTa-large models, as well as
the corresponding performance evaluations. Both models were pre-trained using the largest …

Save Cite Cited by 33 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Marta Villegas

Marta Villegas

Maria: Spanish language models

SIMPLE: A general framework for the development of multilingual lexicons

Cross-lingual text categorization

Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track

Pretrained biomedical language models for clinical NLP in Spanish

[PDF][PDF] Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.

Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario

Medical word embeddings for Spanish: Development and evaluation

Are multilingual models the best choice for moderately under-resourced languages? A comprehensive assessment for Catalan

[PDF][PDF] Spanish language models