User profiles for Asier Gutiérrez-Fandiño

Asier Gutiérrez-Fandiño

Walmart Global Tech
Verified email at walmart.com
Cited by 375

Maria: Spanish language models

A Gutiérrez-Fandiño, J Armengol-Estapé… - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents MarIA, a family of Spanish language models and associated resources
made available to the industry and the research community. Currently, MarIA includes …

Pretrained biomedical language models for clinical NLP in Spanish

…, J Llop, M Pàmies, A Gutiérrez-Fandiño… - Proceedings of the …, 2022 - aclanthology.org
This work presents the first large-scale biomedical Spanish language models trained from
scratch, using large biomedical corpora consisting of a total of 1.1 B tokens and an EHR …

Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario

…, J Armengol-Estapé, A Gutiérrez-Fandiño… - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents biomedical and clinical language models for Spanish by experimenting
with different pretraining choices, such as masking at word and subword level, varying the …

[PDF][PDF] Spanish language models

A Gutiérrez-Fandiño, J Armengol-Estapé… - arXiv preprint arXiv …, 2021 - academia.edu
This paper presents the Spanish RoBERTa-base and RoBERTa-large models, as well as
the corresponding performance evaluations. Both models were pre-trained using the largest …

Anticipating the Debate: Predicting Controversy in News with Transformer-based NLP

BC Figueras, A Gutiérrez-Fandiño… - … del Lenguaje Natural, 2023 - journal.sepln.org
Controversy is a social phenomenon that emerges when a topic generates large disagreement
among people. In the public sphere, controversy is very often related to news. Whereas …

[HTML][HTML] Predicting the evolution of COVID-19 mortality risk: A Recurrent Neural Network approach

…, A Gonzalez-Agirre, A Gutiérrez-Fandiño… - Computer Methods and …, 2023 - Elsevier
Background: In December 2020, the COVID-19 disease was confirmed in 1,665,775 patients
and caused 45,784 deaths in Spain. At that time, health decision support systems were …

Spanish biomedical crawled corpus: A large, diverse dataset for spanish biomedical language models

…, OG Bonet, A Gutiérrez-Fandiño… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce CoWeSe (the Corpus Web Salud Espa\~nol), the largest Spanish biomedical
corpus to date, consisting of 4.5GB (about 750M tokens) of clean plain text. CoWeSe is the …

Spanish legalese language model and corpora

A Gutiérrez-Fandiño, J Armengol-Estapé… - arXiv preprint arXiv …, 2021 - arxiv.org
There are many Language Models for the English language according to its worldwide
relevance. However, for the Spanish language, even if it is a widely spoken language, there are …

Fineas: Financial embedding analysis of sentiment

A Gutiérrez-Fandiño, P Kolm… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce a new language representation model in finance called Financial Embedding
Analysis of Sentiment (FinEAS). In financial markets, news and investor sentiment are …

esCorpius: A Massive Spanish Crawling Corpus

A Gutiérrez-Fandiño, D Pérez-Fernández… - arXiv preprint arXiv …, 2022 - arxiv.org
In the recent years, transformer-based models have lead to significant advances in language
modelling for natural language processing. However, they require a vast amount of data to …