Google Scholar

User profiles for Doina Precup

Doina Precup

DeepMind and McGill University

Verified email at cs.mcgill.ca

Cited by 32831

[PDF] aaai.org

The option-critic architecture

PL Bacon, J Harb, D Precup - Proceedings of the AAAI conference on …, 2017 - ojs.aaai.org

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
While planning with temporally extended actions is well understood, creating such …

Save Cite Cited by 1195 Related articles All 14 versions View as HTML

[PDF] aaai.org

Deep reinforcement learning that matters

…, R Islam, P Bachman, J Pineau, D Precup… - Proceedings of the …, 2018 - ojs.aaai.org

In recent years, significant progress has been made in solving challenging problems across
various domains using deep reinforcement learning (RL). Reproducing existing work and …

Save Cite Cited by 2257 Related articles All 20 versions View as HTML

[HTML] sciencedirect.com

[HTML][HTML] Reward is enough

D Silver, S Singh, D Precup, RS Sutton - Artificial Intelligence, 2021 - Elsevier

In this article we hypothesise that intelligence, and its associated abilities, can be understood
as subserving the maximisation of reward. Accordingly, reward is enough to drive …

Save Cite Cited by 516 Related articles All 7 versions

[PDF] sciencedirect.com

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

RS Sutton, D Precup, S Singh - Artificial intelligence, 1999 - Elsevier

Learning, planning, and representing knowledge at multiple levels of temporal abstraction
are key, longstanding challenges for AI. In this paper we consider how these challenges can …

Save Cite Cited by 4330 Related articles All 40 versions

[PDF] bstu.by

Learning options in reinforcement learning

M Stolle, D Precup - … 5th International Symposium, SARA 2002 Kananaskis …, 2002 - Springer

Temporally extended actions (eg, macro actions) have proven very useful for speeding up
learning, ensuring robustness and building prior knowledge into AI systems. The options …

Save Cite Cited by 453 Related articles All 19 versions

[PDF] mlr.press

Off-policy deep reinforcement learning without exploration

S Fujimoto, D Meger, D Precup - … conference on machine …, 2019 - proceedings.mlr.press

Many practical applications of reinforcement learning constrain agents to learn from a fixed
batch of data which has already been gathered, without offering further possibility for data …

Save Cite Cited by 1398 Related articles All 10 versions View as HTML

[PDF] ieee.org

The multimodal brain tumor image segmentation benchmark (BRATS)

…, JA Mariz, R Meier, S Pereira, D Precup… - IEEE transactions on …, 2014 - ieeexplore.ieee.org

In this paper we report the set-up and results of the Multimodal Brain Tumor Image
Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 …

Save Cite Cited by 5460 Related articles All 38 versions

[PDF] umass.edu

[PDF][PDF] Eligibility traces for off-policy policy evaluation

D Precup - Computer Science Department Faculty …, 2000 - scholarworks.umass.edu

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to
hidden states, and to provide a link between Monte Carlo and temporal-difference methods…

Save Cite Cited by 924 Related articles All 20 versions

[PDF] incompleteideas.net

Fast gradient-descent methods for temporal-difference learning with linear function approximation

RS Sutton, HR Maei, D Precup, S Bhatnagar… - Proceedings of the 26th …, 2009 - dl.acm.org

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning
algorithm compatible with both linear function approximation and off-policy training, and …

Save Cite Cited by 702 Related articles All 27 versions

[PDF] neurips.cc

Learning with pseudo-ensembles

…, O Alsharif, D Precup - Advances in neural …, 2014 - proceedings.neurips.cc

We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child
models spawned from a parent model by perturbing it according to some noise process. Eg, …

Save Cite Cited by 636 Related articles All 11 versions View as HTML

Cite

Advanced search

Saved to My library