The repository implements a modular Deep Deterministic Policy Gradients (DDPG) Reinforcement Learning (RL) with liear temporal logic specifications as high-level misstion specifications.
Under Construction. If any questions, feel free to contact: mingyucai0915@gmail.com.
@article{Cai2021modular, title={Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic}, author={Cai, Mingyu and Hasanbeig, Mohammadhosein and Xiao, Shaoping and Abate, Alessandro and Kan, Zhen}, journal={IEEE Robotics and Automation Letters}, volume={6}, number={4}, pages={7973-7980}, year={2021}, publisher={IEEE} }
The tasks are performed on a custom environments in DeepRL-LTL and CartPole developed via Gym-OpenAI
In addition to preventing the Cartpole from falling over, Task1 is a surveillance mission that requires the cart to visit region yellow and region green periodically (infinite horizon). Task2 requires the cart to visit yellow first and then green (finite horizon). The demos for Task1 and Task2 are shown in left and right respectively.
Task1 is a surveillance mission that requires the ball to visit region 1 and region 2 periodically (infinite horizon). The modular DDPG (on the left) can completely solve the specified task with a 100% success rate. Standard DDPG on the right (the worst scenarios) fails for this repetitve pattern.
Task2 requires the ball to visit region 1, and then region 2 (finite horizon). The modular DDPG (on the left) is able to completely solve the specified task with a 100% success rate. The success rate of Standard DDPG (on the right) is around 86%.
Here are the results (the worst scenarios) using Standard Product MDP, which cannot guarantee the completion of repetitive tasks over the infinite horizon in the CartPole and Ball-Pass problems, respectively.