Chapter 20 This kind of feedback is called a reward, or reinforcement. In games like chess, the reinforcement is received only at the end of the game. We call this a terminal state in the state history sequence. The agent can be a passive learner or an active learner. A passive learner simply watches the world going by, and tries to learn the utility of being in various states; an active learner must also act using the learned information, and can use its problem generator to suggest explorations of unknown portions of the environment. The agent learns an action-value function giving the expected utility of taking a given action in a given state. This is called Q-learning. We define the reward-to-go of a state as the sum of the rewards from that state until a terminal state is reached. Given this definition, it is easy to see that the expected utility of a state is the expected reward-to-go of that state. A simple method for updating uti...
Chapter 3 This chapter talks about various problems that an agent faces and various ways to formulate the problem. It explains different problems based on different scenarios and explains effective ways to come up with a search strategy to solve the problem. It explains the formulate, search, execute design. There are four essentially different types of problems — single state problems, multiple-state problems, contingency problems, and exploration problems. The initial state, operator set, goal test, and path cost function define a problem. The process of removing detail from a representation is called abstraction. Explains various toy problems and how to solve them. A state is arc-consistent j if every variable has a value in its domain that is consistent with each of the constraints on that I variable. Chapter 4 This chapter takes the problems from the previous chapter and explains various ways of solving them. It explains various algorithms and ...
Comments
Post a Comment