Skip to content

Commit

Permalink
copy latest revisions
Browse files Browse the repository at this point in the history
  • Loading branch information
adzcai committed Aug 30, 2024
1 parent 7af8b08 commit d25532d
Show file tree
Hide file tree
Showing 5 changed files with 716 additions and 230 deletions.
2 changes: 0 additions & 2 deletions book/bandits.md
Original file line number Diff line number Diff line change
Expand Up @@ -950,5 +950,3 @@ regret bound. The full details of the analysis can be found in Section 3 of {cit
+++

## Summary


1 change: 1 addition & 0 deletions book/control.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,7 @@ Local linearization might only be accurate in a small region around the
point of linearization.
:::

(iterative_lqr)=
### Iterative LQR

To address these issues with local linearization, we'll use an iterative
Expand Down
3 changes: 2 additions & 1 deletion book/imitation_learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,5 +137,6 @@ def dagger_pseudocode(
return π
```

How well does DAgger perform?


<!-- TODO -->
2 changes: 2 additions & 0 deletions book/mdps.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,8 @@ policy for that state and $0$ otherwise. In this case, the only
randomness in sampling trajectories comes from the initial state
distribution $\mu$ and the state transitions $P$.

+++

### Value functions

The main goal of RL is to find a policy that maximizes the average total
Expand Down
Loading

0 comments on commit d25532d

Please sign in to comment.