I am playing with CartPole using the RL DQN method, similar to the approach outlined here. Subsequently, I examined the greedy actions taken at various velocities, and the results are presented below, which shows the decision boundary decrease with increasing velocity.
My intuition in physics suggests that these results might be incorrect. I believe that cart velocity should not influence the optimal action. The optimal action at (θ, 𝜔) should aim to reduce or maintain the magnitude of θ and decrease the magnitude of 𝜔 when θ is close to zero. Applying force to the cart changes the angular acceleration, subsequently altering 𝜔 and, in turn, θ. The change in 𝜔 and θ is not affected by the current cart velocity and position. Mathematical equations describing the dynamics of cartpole can be found in here, where the cart velocity and position has no effect on angular motion.
Another argument supporting the idea that the policy should be consistent across different cart velocities is by changing the point of reference to match the cart velocity. In doing so, an observer should observe the exact same dynamics as if the cart were stationary, and applying the same policy as if the cart were stationary.
Is there any flaw in my reasoning, or could the difference in greedy policy at different velocities be attributed to artifacts in RL, such as the agent lacking sufficient experience at high velocities?