Deep Reinforcement Learning for Autonomous Bearings-Only Tracking

other · 2026-05-06

A recent study published on arXiv presents a novel deep reinforcement learning approach designed for tracking moving targets using only bearing information. This framework employs a belief Markov decision process that integrates insights from a cubature Kalman filter. The method seeks to optimize both the accuracy of target position estimates and the reliability of the Kalman filter through a specially crafted reward system. A deep Q-network was trained over 50,000 episodes and evaluated through 5,000 Monte Carlo simulations, benchmarked against two existing methodologies: the perpendicular-to-bearing approach and D-optimal Fisher information maximization techniques.

Key facts

Paper develops deep reinforcement learning observer control for bearings-only tracking.
Observer maneuver problem formulated as belief Markov decision process.
Belief state represented by cubature Kalman filter (CKF) posterior.
Reward function balances Euclidean distance and Mahalanobis distance.
Reward is geometric interpolation on Pareto front with β ∈ [0,1].
Policy implemented as deep Q-network (DQN) trained over 50,000 episodes.
Evaluated over 5,000 Monte Carlo episodes.
Compared against perpendicular-to-bearing heuristic and D-optimal Fisher information maximization.

Deep Reinforcement Learning for Autonomous Bearings-Only Tracking

Key facts

Entities

Institutions

Sources