What will be the policy if the state space is continuous in Reinforcement learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the Q function and what is the V function in reinforcement learning?Combining Neural Network with Reinforcement Learning in a Continuous SpaceQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsCatastrophic forgetting in linear semi-gradient RL agent?Why are policy gradient methods preferred over value function approximation in continuous action domains?Experience replay in Reinforcement learning - Batch SizeWhat is wrong with this reinforcement learning environment ?Policy-based RL method - how do continuous actions look like?What is “GOAL” in terms of Reinforcement Learning specified in these papers?Reinforcement learning for continuous state and action space
How to mute a string and play another at the same time
How is an IPA symbol that lacks a name (e.g. ɲ) called?
Coin Game with infinite paradox
What's the difference between using dependency injection with a container and using a service locator?
What could prevent concentrated local exploration?
Who can become a wight?
Do chord progressions usually move by fifths?
A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?
Converting a text document with special format to Pandas DataFrame
Determine the generator of an ideal of ring of integers
Recursive calls to a function - why is the address of the parameter passed to it lowering with each call?
How to create a command for the "strange m" symbol in latex?
How can I introduce the names of fantasy creatures to the reader?
Should man-made satellites feature an intelligent inverted "cow catcher"?
Marquee sign letters
Etymology of 見舞い
Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?
tabularx column has extra padding at right?
Does the Pact of the Blade warlock feature allow me to customize the properties of the pact weapon I create?
Why these surprising proportionalities of integrals involving odd zeta values?
What is the difference between 准时 and 按时?
Is the Mordenkainen's Sword spell underpowered?
Are Flameskulls resistant to magical piercing damage?
How to leave only the following strings?
What will be the policy if the state space is continuous in Reinforcement learning
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the Q function and what is the V function in reinforcement learning?Combining Neural Network with Reinforcement Learning in a Continuous SpaceQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsCatastrophic forgetting in linear semi-gradient RL agent?Why are policy gradient methods preferred over value function approximation in continuous action domains?Experience replay in Reinforcement learning - Batch SizeWhat is wrong with this reinforcement learning environment ?Policy-based RL method - how do continuous actions look like?What is “GOAL” in terms of Reinforcement Learning specified in these papers?Reinforcement learning for continuous state and action space
$begingroup$
I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
reinforcement-learning
$endgroup$
add a comment |
$begingroup$
I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
reinforcement-learning
$endgroup$
add a comment |
$begingroup$
I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
reinforcement-learning
$endgroup$
I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
reinforcement-learning
reinforcement-learning
asked Apr 18 at 4:52
ChinniChinni
276
276
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.
What becomes harder is iterating through the state space. That rules out two simple approaches:
Tabular methods - that store lists of all states with the correct action or value.
Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.
These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.
However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.
Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.
If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.
To answer the questions more directly:
What will be the policy if the state space is continuous in Reinforcement learning
There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.
At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$
Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.
What if the state space is continuous, will the agent have information of every possible state in the state space?
Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.
What becomes harder is iterating through the state space. That rules out two simple approaches:
Tabular methods - that store lists of all states with the correct action or value.
Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.
These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.
However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.
Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.
If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.
To answer the questions more directly:
What will be the policy if the state space is continuous in Reinforcement learning
There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.
At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$
Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.
What if the state space is continuous, will the agent have information of every possible state in the state space?
Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.
$endgroup$
add a comment |
$begingroup$
You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.
What becomes harder is iterating through the state space. That rules out two simple approaches:
Tabular methods - that store lists of all states with the correct action or value.
Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.
These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.
However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.
Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.
If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.
To answer the questions more directly:
What will be the policy if the state space is continuous in Reinforcement learning
There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.
At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$
Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.
What if the state space is continuous, will the agent have information of every possible state in the state space?
Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.
$endgroup$
add a comment |
$begingroup$
You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.
What becomes harder is iterating through the state space. That rules out two simple approaches:
Tabular methods - that store lists of all states with the correct action or value.
Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.
These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.
However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.
Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.
If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.
To answer the questions more directly:
What will be the policy if the state space is continuous in Reinforcement learning
There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.
At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$
Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.
What if the state space is continuous, will the agent have information of every possible state in the state space?
Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.
$endgroup$
You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.
What becomes harder is iterating through the state space. That rules out two simple approaches:
Tabular methods - that store lists of all states with the correct action or value.
Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.
These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.
However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.
Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.
If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.
To answer the questions more directly:
What will be the policy if the state space is continuous in Reinforcement learning
There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.
At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$
Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.
What if the state space is continuous, will the agent have information of every possible state in the state space?
Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?
Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.
edited Apr 18 at 8:34
answered Apr 18 at 7:42
Neil SlaterNeil Slater
17.8k33264
17.8k33264
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown