User profiles for Doina Precup

Doina Precup

DeepMind and McGill University
Verified email at cs.mcgill.ca
Cited by 32831

The option-critic architecture

PL Bacon, J Harb, D Precup - Proceedings of the AAAI conference on …, 2017 - ojs.aaai.org
Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
While planning with temporally extended actions is well understood, creating such …

Deep reinforcement learning that matters

…, R Islam, P Bachman, J Pineau, D Precup… - Proceedings of the …, 2018 - ojs.aaai.org
In recent years, significant progress has been made in solving challenging problems across
various domains using deep reinforcement learning (RL). Reproducing existing work and …

[HTML][HTML] Reward is enough

D Silver, S Singh, D Precup, RS Sutton - Artificial Intelligence, 2021 - Elsevier
In this article we hypothesise that intelligence, and its associated abilities, can be understood
as subserving the maximisation of reward. Accordingly, reward is enough to drive …

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

RS Sutton, D Precup, S Singh - Artificial intelligence, 1999 - Elsevier
Learning, planning, and representing knowledge at multiple levels of temporal abstraction
are key, longstanding challenges for AI. In this paper we consider how these challenges can …

Learning options in reinforcement learning

M Stolle, D Precup - … 5th International Symposium, SARA 2002 Kananaskis …, 2002 - Springer
Temporally extended actions (eg, macro actions) have proven very useful for speeding up
learning, ensuring robustness and building prior knowledge into AI systems. The options …

Off-policy deep reinforcement learning without exploration

S Fujimoto, D Meger, D Precup - … conference on machine …, 2019 - proceedings.mlr.press
Many practical applications of reinforcement learning constrain agents to learn from a fixed
batch of data which has already been gathered, without offering further possibility for data …

The multimodal brain tumor image segmentation benchmark (BRATS)

…, JA Mariz, R Meier, S Pereira, D Precup… - IEEE transactions on …, 2014 - ieeexplore.ieee.org
In this paper we report the set-up and results of the Multimodal Brain Tumor Image
Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 …

[PDF][PDF] Eligibility traces for off-policy policy evaluation

D Precup - Computer Science Department Faculty …, 2000 - scholarworks.umass.edu
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to
hidden states, and to provide a link between Monte Carlo and temporal-difference methods…

Fast gradient-descent methods for temporal-difference learning with linear function approximation

RS Sutton, HR Maei, D Precup, S Bhatnagar… - Proceedings of the 26th …, 2009 - dl.acm.org
Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning
algorithm compatible with both linear function approximation and off-policy training, and …

Learning with pseudo-ensembles

…, O Alsharif, D Precup - Advances in neural …, 2014 - proceedings.neurips.cc
We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child
models spawned from a parent model by perturbing it according to some noise process. Eg, …