Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 22;24(8):1168.
doi: 10.3390/e24081168.

Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Affiliations

Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Barouch Matzliach et al. Entropy (Basel). .

Abstract

This paper addresses the problem of detecting multiple static and mobile targets by an autonomous mobile agent acting under uncertainty. It is assumed that the agent is able to detect targets at different distances and that the detection includes errors of the first and second types. The goal of the agent is to plan and follow a trajectory that results in the detection of the targets in a minimal time. The suggested solution implements the approach of deep Q-learning applied to maximize the cumulative information gain regarding the targets' locations and minimize the trajectory length on the map with a predefined detection probability. The Q-learning process is based on a neural network that receives the agent location and current probability map and results in the preferred move of the agent. The presented procedure is compared with the previously developed techniques of sequential decision making, and it is demonstrated that the suggested novel algorithm strongly outperforms the existing methods.

Keywords: autonomous agent; deep Q-learning; neural network; probabilistic decision-making; search and detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Receiving information and updating the probability map.
Figure 2
Figure 2
The neural network scheme used in the learning stage of the Q-max algorithm.
Figure 3
Figure 3
Scheme of the data flow in the training stage of the network.
Figure 4
Figure 4
The actions of the online model-free learning procedure of the Q-max algorithm.
Figure 5
Figure 5
The actions of the offline model-based learning procedure of the Q-max algorithm.
Figure 6
Figure 6
The change in the temporal difference learning error with respect to the number of training epochs. The solid line is associated with the training stage, and the dashed line is associated with the validation stage.
Figure 7
Figure 7
Discounted cumulative reward of detection by the Q-max algorithm (a) and cumulative payoff of detection by the SPL algorithm (b) compared with the results obtained by the random detection procedure. The solid line in both figures is associated with the suggested algorithms (Q-max and SPL), and the dashed line is associated with the random choice of actions.
Figure 8
Figure 8
Cumulative reward of detection by the Q-max algorithm for static targets (a) and cumulative payoff of detection by the SPL algorithm for static targets (b) compared with the results obtained by the COV algorithm.
Figure 9
Figure 9
The number of agent actions in detecting two static targets with the SPL/Q-max algorithms (black bars) and the COV algorithm (gray bars): (a) λ=15 and (b) λ=10.
Figure 10
Figure 10
Dependence of the detection probabilities on the number of planned actions for the SPL algorithm (solid line) and DP algorithm (dotted line); the sensor sensitivity is λ=15, the false alarm rate is α=0.25, and the termination time is t=120 min.
Figure 11
Figure 11
Dependence of the detection probabilities on the false alarm rate α for sensor sensitivities λ=15 (dotted line) and λ=10 (dashed line). The probability 0.95 for the SPL algorithm and all values of α is depicted by the solid line. The termination time is 120 min.

References

    1. Nahin P.J. Chases and Escapes: The Mathematics of Pursuit and Evasion. Princeton University Press; Princeton, NJ, USA: 2007.
    1. Washburn A.R. Search and Detection. ORSA Books; Arlington, VA, USA: 1989.
    1. Koopman B.O. Search, and Screening. Operation Evaluation Research Group Report, 56. Center for Naval Analysis; Rosslyn, VI, USA: 1946.
    1. Stone L.D. Theory of Optimal Search. Academic Press; New York, NY, USA: 1975.
    1. Cooper D., Frost J., Quincy R. Compatibility of Land SAR Procedures with Search Theory. US Department of Homeland Security; Washington, DC, USA: 2003.

LinkOut - more resources