Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is really nice, great idea. I am going to make a suggestion which I hope is helpful - I don't mean to be critical of this nice project.

After going through the MDP example, I have one comment on the way you introduce the non-deterministic transition function. In your example the non-determinism comes from the agent making "mistakes", it can mistakenly go left or right when trying to go up or down:

1) You could introduce the mistakes more clearly as it isn't really explained the agent makes mistakes in the text, and so the comment about mistakes in the transition() function is initally a bit confusing.

2) I think the way this introduces non-determinism could be more didactic if the non-determinism came from the environment, not the agent? For example the agent might be moving on a rough surface and moving its tracks/limbs/whatever might not always produce the intended outcome. As you present it the transition is a function from an action to a random action to a random state, and the definition is just a function from an action to a random state.



Thank you so much for this feedback! Indeed, this is definitely confusing in the notebook. I pushed a small commit to make it a little bit more clear that the non-determinism comes from the probabilistic nature of the environment dynamics (and not b/c the agent chooses a different action by mistake).

As a side note, initially I meant to go through it in a video to fill the gaps in the text with my voice. But given that I didn't have time for those, I am fixing those gaps first :) Thanks again!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: