RL in a nutshell
state (of the game)
- board
- score
goal / reward
- highest score
policy (to learn)
- random placement
- leave fewest holes
- gravitate to one side
- try and clean lines and rows (most human)
RL in a nutshell
state (of the game)
goal / reward
policy (to learn)