Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Barouch Matzliach^{1

2}, Irad Ben-Gal^{1

2}, Evgeny Kagan³

Affiliations

¹ Department Industrial Engineering, Tel-Aviv University, 6997801 Tel Aviv, Israel.
² Laboratory for Artificial Intelligence, Machine Learning, Business and Data Analytics, Tel-Aviv University, 6997801 Tel Aviv, Israel.
³ Department Industrial Engineering, Ariel University, 4076414 Ariel, Israel.

PMID: 36010832
PMCID: PMC9407070
DOI: 10.3390/e24081168

Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Barouch Matzliach et al. Entropy (Basel). 2022.

. 2022 Aug 22;24(8):1168.

doi: 10.3390/e24081168.

Authors

Barouch Matzliach^{1

2}, Irad Ben-Gal^{1

2}, Evgeny Kagan³

Affiliations

¹ Department Industrial Engineering, Tel-Aviv University, 6997801 Tel Aviv, Israel.
² Laboratory for Artificial Intelligence, Machine Learning, Business and Data Analytics, Tel-Aviv University, 6997801 Tel Aviv, Israel.
³ Department Industrial Engineering, Ariel University, 4076414 Ariel, Israel.

PMID: 36010832
PMCID: PMC9407070
DOI: 10.3390/e24081168

Abstract

This paper addresses the problem of detecting multiple static and mobile targets by an autonomous mobile agent acting under uncertainty. It is assumed that the agent is able to detect targets at different distances and that the detection includes errors of the first and second types. The goal of the agent is to plan and follow a trajectory that results in the detection of the targets in a minimal time. The suggested solution implements the approach of deep Q-learning applied to maximize the cumulative information gain regarding the targets' locations and minimize the trajectory length on the map with a predefined detection probability. The Q-learning process is based on a neural network that receives the agent location and current probability map and results in the preferred move of the agent. The presented procedure is compared with the previously developed techniques of sequential decision making, and it is demonstrated that the suggested novel algorithm strongly outperforms the existing methods.

Keywords: autonomous agent; deep Q-learning; neural network; probabilistic decision-making; search and detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Receiving information and updating the probability map.

**Figure 2**
The neural network scheme used in the learning stage of the Q-max algorithm.

**Figure 3**
Scheme of the data flow in the training stage of the network.

**Figure 4**
The actions of the online model-free learning procedure of the Q-max algorithm.

**Figure 5**
The actions of the offline model-based learning procedure of the Q-max algorithm.

**Figure 6**
The change in the temporal difference learning error with respect to the number of training epochs. The solid line is associated with the training stage, and the dashed line is associated with the validation stage.

**Figure 7**
Discounted cumulative reward of detection by the Q-max algorithm (a) and cumulative payoff of detection by the SPL algorithm (b) compared with the results obtained by the random detection procedure. The solid line in both figures is associated with the suggested algorithms (Q-max and SPL), and the dashed line is associated with the random choice of actions.

**Figure 8**
Cumulative reward of detection by the Q-max algorithm for static targets (a) and cumulative payoff of detection by the SPL algorithm for static targets (b) compared with the results obtained by the COV algorithm.

**Figure 9**
The number of agent actions in detecting two static targets with the SPL/Q-max algorithms (black bars) and the COV algorithm (gray bars): (a) $λ = 15$ and (b) $λ = 10$ .

**Figure 10**
Dependence of the detection probabilities on the number of planned actions for the SPL algorithm (solid line) and DP algorithm (dotted line); the sensor sensitivity is $λ = 15$ , the false alarm rate is $α = 0.25$ , and the termination time is $t = 120$ min.

**Figure 11**
Dependence of the detection probabilities on the false alarm rate $α$ for sensor sensitivities $λ = 15$ (dotted line) and $λ = 10$ (dashed line). The probability $0.95$ for the SPL algorithm and all values of $α$ is depicted by the solid line. The termination time is $120$ min.

See this image and copyright information in PMC

References

1. Nahin P.J. Chases and Escapes: The Mathematics of Pursuit and Evasion. Princeton University Press; Princeton, NJ, USA: 2007.
1. Washburn A.R. Search and Detection. ORSA Books; Arlington, VA, USA: 1989.
1. Koopman B.O. Search, and Screening. Operation Evaluation Research Group Report, 56. Center for Naval Analysis; Rosslyn, VI, USA: 1946.
1. Stone L.D. Theory of Optimal Search. Academic Press; New York, NY, USA: 1975.
1. Cooper D., Frost J., Quincy R. Compatibility of Land SAR Procedures with Search Theory. US Department of Homeland Security; Washington, DC, USA: 2003.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Affiliations

Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources