Deep Reinforcement Learning for Autonomous Bearings-Only Tracking
A recent study published on arXiv presents a novel deep reinforcement learning approach designed for tracking moving targets using only bearing information. This framework employs a belief Markov decision process that integrates insights from a cubature Kalman filter. The method seeks to optimize both the accuracy of target position estimates and the reliability of the Kalman filter through a specially crafted reward system. A deep Q-network was trained over 50,000 episodes and evaluated through 5,000 Monte Carlo simulations, benchmarked against two existing methodologies: the perpendicular-to-bearing approach and D-optimal Fisher information maximization techniques.
Key facts
- Paper develops deep reinforcement learning observer control for bearings-only tracking.
- Observer maneuver problem formulated as belief Markov decision process.
- Belief state represented by cubature Kalman filter (CKF) posterior.
- Reward function balances Euclidean distance and Mahalanobis distance.
- Reward is geometric interpolation on Pareto front with β ∈ [0,1].
- Policy implemented as deep Q-network (DQN) trained over 50,000 episodes.
- Evaluated over 5,000 Monte Carlo episodes.
- Compared against perpendicular-to-bearing heuristic and D-optimal Fisher information maximization.
Entities
Institutions
- arXiv