A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret
Official Repository for “A Joint Imitation-Reinforcement Learning (JIRL) Framework for Reduced Baseline Regret”
Technical Report
The report contains a detailed description of the experimental settings and hyperparameters used to obtain the results reported in our paper.
Objectives
- Leveraging a baseline’s online demonstrations to minimize the regret w.r.t the baseline policy during training
- Eventually surpassing the baseline performance
Assumptions
- Access to a baseline policy at every time step
- Uses an off-policy RL algorithm
Framework
Experiment Domains

Inverted pendulum

Lunar lander

Walker-2D

Lane following (CARLA)

Lane following (JetRacer)
Results
Performance

Inverted pendulum

Lunar lander

Lane following (CARLA)

Walker-2D

Lane following (JetRacer)
Baseline Regret


Lunar lander (JIRL vs TRPO)