Chapter 20 This kind of feedback is called a reward, or reinforcement. In games like chess, the reinforcement is received only at the end of the game. We call this a terminal state in the state history sequence. The agent can be a passive learner or an active learner. A passive learner simply watches the world going by, and tries to learn the utility of being in various states; an active learner must also act using the learned information, and can use its problem generator to suggest explorations of unknown portions of the environment. The agent learns an action-value function giving the expected utility of taking a given action in a given state. This is called Q-learning. We define the reward-to-go of a state as the sum of the rewards from that state until a terminal state is reached. Given this definition, it is easy to see that the expected utility of a state is the expected reward-to-go of that state. A simple method for updating uti...
Chapter 14 Agents almost never have access to the whole truth about their environment. The right thing to do, the rational decision, therefore, depends on both the relative importance of various goals and the likelihood that, and degree to which, they will be achieved. Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance. Probability theory makes the same ontological commitment as logic, namely, that facts either do or do not hold in the world. Degree of truth, as opposed to degree of belief, is the subject of fuzzy logic. Prior or unconditional probability; after the evidence is obtained, we talk about posterior or conditional probability. An agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action If Agent 1 expresses a set of degrees of belief that violate the axioms of probability theory then there is a betting strategy for Agent 2...
Chapter 1 The definitions on top are concerned with thought processes and reasoning, whereas the ones on the bottom address behavior. The definitions on the left measure success in terms of fidelity to human performance, whereas the ones on the right measure against an ideal performance measure, called rationality, system is rational if it does the “right thing,” given what it knows. The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory operational definition of intelligence. A computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the written responses come from a person or from a computer. The interdisciplinary field of cognitive science brings together computer models from AI and experimental techniques from psychology to construct precise and testable theories of the human mind. An agent is just something that acts (agent comes from the Latin a gere, to do). Of course, all comp...
Comments
Post a Comment