46. The Q value R: Reward P xy : The probability of reaching state y from x by taking action action alpha. Gamma: Discount factor (between 0 and 1). V*(y): The expected total discounted return starting in y following the policy *. Policy: a sequence of actions.
47. The Expected Total Discount Return V for a state is the maximal Q value among all actions that can be taken at the state (following the rest of the policy).