What will be the policy if the state space is continuous in Reinforcement learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the Q function and what is the V function in reinforcement learning?Combining Neural Network with Reinforcement Learning in a Continuous SpaceQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsCatastrophic forgetting in linear semi-gradient RL agent?Why are policy gradient methods preferred over value function approximation in continuous action domains?Experience replay in Reinforcement learning - Batch SizeWhat is wrong with this reinforcement learning environment ?Policy-based RL method - how do continuous actions look like?What is “GOAL” in terms of Reinforcement Learning specified in these papers?Reinforcement learning for continuous state and action space

How to mute a string and play another at the same time

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Coin Game with infinite paradox

What's the difference between using dependency injection with a container and using a service locator?

What could prevent concentrated local exploration?

Who can become a wight?

Do chord progressions usually move by fifths?

A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?

Converting a text document with special format to Pandas DataFrame

Determine the generator of an ideal of ring of integers

Recursive calls to a function - why is the address of the parameter passed to it lowering with each call?

How to create a command for the "strange m" symbol in latex?

How can I introduce the names of fantasy creatures to the reader?

Should man-made satellites feature an intelligent inverted "cow catcher"?

Marquee sign letters

Etymology of 見舞い

Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?

tabularx column has extra padding at right?

Does the Pact of the Blade warlock feature allow me to customize the properties of the pact weapon I create?

Why these surprising proportionalities of integrals involving odd zeta values?

What is the difference between 准时 and 按时?

Is the Mordenkainen's Sword spell underpowered?

Are Flameskulls resistant to magical piercing damage?

How to leave only the following strings?



What will be the policy if the state space is continuous in Reinforcement learning



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the Q function and what is the V function in reinforcement learning?Combining Neural Network with Reinforcement Learning in a Continuous SpaceQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsCatastrophic forgetting in linear semi-gradient RL agent?Why are policy gradient methods preferred over value function approximation in continuous action domains?Experience replay in Reinforcement learning - Batch SizeWhat is wrong with this reinforcement learning environment ?Policy-based RL method - how do continuous actions look like?What is “GOAL” in terms of Reinforcement Learning specified in these papers?Reinforcement learning for continuous state and action space










1












$begingroup$


I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










share|improve this question









$endgroup$
















    1












    $begingroup$


    I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
    Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
      Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










      share|improve this question









      $endgroup$




      I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
      Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?







      reinforcement-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 18 at 4:52









      ChinniChinni

      276




      276




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



          What becomes harder is iterating through the state space. That rules out two simple approaches:



          • Tabular methods - that store lists of all states with the correct action or value.


          • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.


          These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



          However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



          Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



          If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



          To answer the questions more directly:




          What will be the policy if the state space is continuous in Reinforcement learning




          There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



          At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$



          Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




          What if the state space is continuous, will the agent have information of every possible state in the state space?




          Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




          Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




          Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



            What becomes harder is iterating through the state space. That rules out two simple approaches:



            • Tabular methods - that store lists of all states with the correct action or value.


            • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.


            These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



            However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



            Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



            If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



            To answer the questions more directly:




            What will be the policy if the state space is continuous in Reinforcement learning




            There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



            At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$



            Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




            What if the state space is continuous, will the agent have information of every possible state in the state space?




            Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




            Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




            Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






            share|improve this answer











            $endgroup$

















              2












              $begingroup$

              You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



              What becomes harder is iterating through the state space. That rules out two simple approaches:



              • Tabular methods - that store lists of all states with the correct action or value.


              • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.


              These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



              However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



              Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



              If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



              To answer the questions more directly:




              What will be the policy if the state space is continuous in Reinforcement learning




              There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



              At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$



              Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




              What if the state space is continuous, will the agent have information of every possible state in the state space?




              Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




              Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




              Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






              share|improve this answer











              $endgroup$















                2












                2








                2





                $begingroup$

                You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



                What becomes harder is iterating through the state space. That rules out two simple approaches:



                • Tabular methods - that store lists of all states with the correct action or value.


                • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.


                These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



                However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



                Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



                If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



                To answer the questions more directly:




                What will be the policy if the state space is continuous in Reinforcement learning




                There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



                At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$



                Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




                What if the state space is continuous, will the agent have information of every possible state in the state space?




                Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




                Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




                Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






                share|improve this answer











                $endgroup$



                You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



                What becomes harder is iterating through the state space. That rules out two simple approaches:



                • Tabular methods - that store lists of all states with the correct action or value.


                • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.


                These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



                However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



                Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



                If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



                To answer the questions more directly:




                What will be the policy if the state space is continuous in Reinforcement learning




                There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



                At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hatq(s,a,theta) approx Q(s,a)$



                Using a neural network for $hatq(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




                What if the state space is continuous, will the agent have information of every possible state in the state space?




                Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




                Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




                Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 18 at 8:34

























                answered Apr 18 at 7:42









                Neil SlaterNeil Slater

                17.8k33264




                17.8k33264



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Sum ergo cogito? 1 nng

                    419 nièngy_Soadمي 19bal1.5o_g

                    Queiggey Chernihivv 9NnOo i Zw X QqKk LpB