User profiles for Doina Precup
Doina PrecupDeepMind and McGill University Verified email at cs.mcgill.ca Cited by 32831 |
The option-critic architecture
Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
While planning with temporally extended actions is well understood, creating such …
While planning with temporally extended actions is well understood, creating such …
Deep reinforcement learning that matters
In recent years, significant progress has been made in solving challenging problems across
various domains using deep reinforcement learning (RL). Reproducing existing work and …
various domains using deep reinforcement learning (RL). Reproducing existing work and …
[HTML][HTML] Reward is enough
In this article we hypothesise that intelligence, and its associated abilities, can be understood
as subserving the maximisation of reward. Accordingly, reward is enough to drive …
as subserving the maximisation of reward. Accordingly, reward is enough to drive …
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
Learning, planning, and representing knowledge at multiple levels of temporal abstraction
are key, longstanding challenges for AI. In this paper we consider how these challenges can …
are key, longstanding challenges for AI. In this paper we consider how these challenges can …
Learning options in reinforcement learning
M Stolle, D Precup - … 5th International Symposium, SARA 2002 Kananaskis …, 2002 - Springer
Temporally extended actions (eg, macro actions) have proven very useful for speeding up
learning, ensuring robustness and building prior knowledge into AI systems. The options …
learning, ensuring robustness and building prior knowledge into AI systems. The options …
Off-policy deep reinforcement learning without exploration
Many practical applications of reinforcement learning constrain agents to learn from a fixed
batch of data which has already been gathered, without offering further possibility for data …
batch of data which has already been gathered, without offering further possibility for data …
The multimodal brain tumor image segmentation benchmark (BRATS)
In this paper we report the set-up and results of the Multimodal Brain Tumor Image
Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 …
Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 …
[PDF][PDF] Eligibility traces for off-policy policy evaluation
D Precup - Computer Science Department Faculty …, 2000 - scholarworks.umass.edu
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to
hidden states, and to provide a link between Monte Carlo and temporal-difference methods…
hidden states, and to provide a link between Monte Carlo and temporal-difference methods…
Fast gradient-descent methods for temporal-difference learning with linear function approximation
Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning
algorithm compatible with both linear function approximation and off-policy training, and …
algorithm compatible with both linear function approximation and off-policy training, and …
Learning with pseudo-ensembles
…, O Alsharif, D Precup - Advances in neural …, 2014 - proceedings.neurips.cc
We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child
models spawned from a parent model by perturbing it according to some noise process. Eg, …
models spawned from a parent model by perturbing it according to some noise process. Eg, …
Related searches
- doina precup hierarchical reinforcement learning
- doina precup machine learning
- doina precup options
- doina precup off policy
- doina precup importance sampling
- doina precup exploration
- sutton doina precup
- doina precup mcgill university
- doina precup successor
- doina precup value functions
- doina precup generalized value
- doina precup scott
- doina precup yee whye teh
- doina precup rl