Reinforcement Learning for Programmers (Author's Direct Lecture) Course

Notice regarding errors on December 10, 2022.

There have been many changes to the related packages since I posted the lecture.

There are three types of errors that can occur:

Error 1 is caused by a change in the protoc package.

You can solve the problem by deleting the protobuf package and installing version 3.8.

Error number 2 is a problem with the reset function provided by the gym package. Since the return value is a dictionary, the problem can be solved by adding the code state[0] that selects the first value.

Error number 3 is a problem that occurs because the step function provided by the gym package has one more return value. You can solve it by adding one more none2 variable to the receiver.

1. When running the example program, the following error occurs:

TypeError: Descriptors cannot be created directly.

If this call came from a _ pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.

If you cannot immediately regenerate your protos, some other possible workarounds are:

1. Downgrade the protobuf package to 3.20.x or lower.

2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

1. Solution

pip uninstall protobuf

pip install protobuf==3.8

2. Dictionary processing solution

state = env.reset()

state = state[0] #Add code

3. How to solve the problem of adding return value

state_next, reward, done, none, none2 = self.env.step(action)