Game Theory in Defence Applications: A Review

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Abstract

This paper presents a succinct review of attempts in the literature to use game theory to model decision-making scenarios relevant to defence applications. Game theory has been proven as a very effective tool in modelling the decision-making processes of intelligent agents, entities, and players. It has been used to model scenarios from diverse fields such as economics, evolutionary biology, and computer science. In defence applications, there is often a need to model and predict the actions of hostile actors, and players who try to evade or out-smart each other. Modelling how the actions of competitive players shape the decision making of each other is the forte of game theory. In past decades, there have been several studies that applied different branches of game theory to model a range of defence-related scenarios. This paper provides a structured review of such attempts, and classifies existing literature in terms of the kind of warfare modelled, the types of games used, and the players involved. After careful selection, a total of 29 directly relevant papers are discussed and classified. In terms of the warfares modelled, we recognise that most papers that apply game theory in defence settings are concerned with Command and Control Warfare, and can be further classified into papers dealing with (i) Resource Allocation Warfare (ii) Information Warfare (iii) Weapons Control Warfare, and (iv) Adversary Monitoring Warfare. We also observe that most of the reviewed papers are concerned with sensing, tracking, and large sensor networks, and the studied problems have parallels in sensor network analysis in the civilian domain. In terms of the games used, we classify the reviewed papers into papers that use non-cooperative or cooperative games, simultaneous or sequential games, discrete or continuous games, and non-zero-sum or zero-sum games. Similarly, papers are also classified into two-player, three-player or multi-player game based papers. We also explore the nature of players and the construction of payoff functions in each scenario. Finally, we also identify gaps in literature where game theory could be fruitfully applied in scenarios hitherto unexplored using game theory. The presented analysis provides a concise summary of the state-of-the-art with regards to the use of game theory in defence applications and highlights the benefits and limitations of game theory in the considered scenarios.

Keywords: decision making, game theory, defence science, ground warfare, maritime warfare, aerial warfare, tracking, sensing

1. Introduction

Game Theory has become one of the conventional theoretical frameworks to model important decision making processes in many aspects of our life. The well-known examples can be found in economics, social sciences, finance, project management, computer science, civics, and epidemiology (see [1,2,3,4,5] and references therein). Since the seminal work of John Von Neumann, John Nash, and others [6,7,8], it has been well recognised that there is an optimal strategy in the context of complex interactions (games) between two or more parties (players) that can lead to a predictable outcome (payoff). In practical situations, this outcome can often be quantitative and amenable to arithmetic operations (cost, number of infected people, number of vaccinated people etc.), but often it can be qualitative in nature (such as risk, readiness level, health state etc.).

The application of game theory and related mathematical approaches have recently attracted ever-increasing attention in the defence domain. This is due to two driving factors. First, game theory provides a natural framework to promptly translate a high-level policy decision into optimal strategy by framing it in quantitative terms such as payoff, cost, gain or loss, risk, etc. This creates a united platform for defence decision-makers to support arriving at a particular decision. Second, it provides a rigorous mathematical framework for the evaluation and optimisation of numerous scenarios in accordance with predefined criteria. This prompt evaluation often becomes the critical success factor in the defence operational context, leading to decision superiority under time pressure. This also becomes the critical step in the development and deployment of various Artificial Intelligence (AI) capabilities in defence operations.

The application of game theory in defence has a sustained and diversified history ranging from the design of real-time military systems (e.g, applied in missile interception) to the support of strategic decisions on large defence investments and acquisitions. There is extensive literature on specific theoretical methods and tools and their defence applications. We believe that the review of this literature is of interest to the community dealing with operational analysis and data-driven decision support. This is the main motivation for the presented study.

Game theory [9,10] enhances military strategies and decision making processes with a holistic and quantitative analysis of situations [11]. For the military, the potential scenarios amenable to game-theoretic analysis include the rapidly growing applications of autonomous intelligent systems and game theory provides a comprehensive mathematical framework that greatly enhances the capabilities of decision making of the people who use these systems. Because of its potential, research into game theory is burgeoning, with more than a few papers beginning to emerge in the literature for this military research niche. This review aims to assist researchers in utilising the body of knowledge in game theory to develop smarter and safer decision making systems for defence practitioners. Given that the state of such research is still in an incipient phase, we do this by drawing connections between existing military knowledge with the nascent possibilities that game theory offers so that it can become a more widely understood and considered framework in military control systems.

To understand the state-of-the-art in the field of game-theoretic applications in defence, and to analyse the types of games used in such contexts, a review is needed. To the best of our knowledge, such a review, spanning different applications of game theory in a variety of military domains, is lacking. The goal of this paper is to present such a review that will provide a better understanding of the multitude of defence problems in which game theory can be successfully applied. Moreover, the multidimensional classification of the types of games used in different contexts will provide researchers insights into new ways of applying game theory in related problems. Finally, we present gaps in the literature which we hope will give rise to more research and development of novel game-theoretic approaches to defence problems.

Although it is not overly extensive, the body of literature around the game theory in the military has covered a notable portion of the different forms of engagement and combat. These papers cover past, present and future scenarios: from predictive strategies in potentially hostile situations to analytical assessments in hindsight of military standoffs thousands of years ago. Game theory has demonstrated the capacity to be useful in any such military scenario. However, rapid technological progress has led to consistently new frontiers of military engagement, each of these possessing its own complex systems. The overarching areas that have been addressed are Tracking systems (across all domains), Aerial combat, Ground combat, National Security issues, Cyberwarfare and Space systems. Notably, applications of game theory in Naval warfare have been few, and an exploration into the future research into areas like this will be discussed later in the review. Within each of these areas, there are a myriad of possibilities for new and innovative systems: different agents, different weapons, different control structures - and each of these could be enriched with game-theoretic analysis. While Haywood’s and Thunholm’s treatises on the game theory used in military decision making provide coverage of several different game types [12,13], there does not seem to be a paper that addresses the use of game theory in the military across each of the respective fields in the new context of military systems built on high-performance computing and complex algorithms. We aim to present the literature in such a way that it addresses all of the functions of game theory in military control systems in each key domain.

This review has considered in detail a total of 29 papers after careful selection. It highlights the scope and utility of each analysed paper by presenting it in terms of the essential game-theoretic concepts: players, game types, strategies and the key parameters of their payoff functions. It will act as both an annotated bibliography as well as a framework to understand and plan further research into the area. It will also lay out the fundamental tenets that are considered by players in every military decision-making scenario, as well as how they impact the decisions that are made by military personnel and systems, either while competing with hostile players or while cooperating with friendly players. This will make it possible for most military scenarios to be viewed as games and can provide, at the very least, an interesting new perspective on familiar military situations. The 29 papers reviewed here were selected from Scopus and Google Scholar, by a team of experts with related backgrounds from defence, academia and industry who could offer diverse perspectives, who identified the most pertinent papers based on diverse experience. Only papers written in English were considered. While it is acknowledged that an exhaustive search was not performed, the papers, to the best of our knowledge, cover a significant and representative section of the research niche we discuss here, and sufficiently demonstrate the trends, overlaps and gaps in the literature in this niche. It is confidently expected, therefore, that the presented analysis will provide a rigorous comparison between the analysed papers and highlight the strengths and weaknesses of each, while also highlighting the overall pros and cons of utilising game theory to model decision making in military contexts.

The rest of the paper is structured as follows. Section 2 will discuss the basic defence principles which are elaborated by the papers that we review, as well as introduce basic concepts of game theory. Section 3 investigates and analyses the literature and summarises the findings and associations in each of the papers. Section 4 elaborates our multi-dimensional classification of the literature based on the observations made in the previous section, and also presents citation and other metrics related to the papers reviewed. Section 5 identifies the gaps in the literature and based on this, highlights opportunities for future research in this niche, particularly areas of defence research that could benefit from the application of game theory where game theory has not been often applied so far. Section 6 provides an in-depth discussion about the utility of the findings and the presented review in general. Finally, Section 7 summarises our findings and classifications and provides broad conclusions.

2. Background

Ideologies, beliefs and knowledge about war have been shaping human knowledge and philosophy for centuries. The great works of Sun Tzu, Homer and Machiavelli [14,15,16] have not only established a foundation for knowledge etched into the essence of military decision making, but also provided insight into sociology and social psychology [17]. The military forms a core power bloc for many civilisations and is instrumental to both the growth of influence for existing nations, and the birth of new nations [18]. The military deals with conflicts in real-time, plans for the future, as well as reviews past engagements - and every single one of these activities has an impact on society [19]. This review therefore by necessity addresses many facets of military conflict across multiple physical domains, and the major decisions that need to be made in each of these domains will be summarised below. Across all of these domains, however, the value of targets, the value of resources and the priority of objectives are usually the key parameters that shape the payoff functions and strategies which in turn define the games that we use in modelling.

In this section, we discuss the concepts in defence science and technology, as well as game theory, which are necessary to understand and analyse the literature in the presented niche. First of all, let us consider the broad domains of defence and national security which are considered in this review. They can be summarised, as shown in Table 1 .

Table 1

Classification System used in this review.

Focus Area	Command and Control Warfare
Mode	Traditional (T)			Modern (M)
Warfare Domain	Land (L)	Sea (S)	Air (A)	Cyber (C)	Space (Sp)
Warfare Type	Resource Allocation Warfare (RAW) or Information Warfare (IW) or Weapon Control Wafare (WCW) or Adversary Monitoring Warfare (AMW)
Game Theory Categorisation 1	Non-Cooperative (NCo) or Cooperative (Co)
Game Theory Categorisation 2	Sequential (Seq) or Simultaneous (Sim)
Game Theory Categorisation 3	Discrete (D) or Continuous (C)
Game Theory Categorisation 4	Zero Sum (ZS) or Non-Zero Sum (NZS)
Number of Players	2 player (2P) or 3-player (3P) or More than three-player (NP)

As shown in Table 1 , in this review, the focus is primarily on ‘command and control’ warfare, where decision making is critical. However, command and control warfare has applicability in traditional domains of warfare, such as land, sea, and air warfare, as well as modern domains of warfare, such as space and cyber warfare. At an orthogonal level, command and control warfare could also be sub-divided into Resource Allocation Warfare (RAW), Information Warfare (IW), Weapons Control Warfare (WCW), as well as Adversary Monitoring Warfare (AMW). Since these concepts are extensively used in our classification of literature, let us briefly introduce them first.

2.1. Warfare Types

Resource Allocation Warfare (RAW): the allocation of military resources to achieve military objectives.

Information Warfare (IW): the manipulation of information to achieve military objectives. Weapons Control Warfare (WCW): Control of weapons in achieving military objectives.

Adversary Monitoring Warfare (AMW): Tracking the behaviour of an enemy to fulfil military objectives.

2.2. Warfare Domains

2.2.1. Land Warfare

Technology is a dictating force of warfare, and technology is not as imperative to land warfare as it is to other domains [20]. The technology that has impacted land warfare has been relatively static and avoids the exposure of human resources if possible [21,22]. Interpersonal combat at a physical level is much less prevalent nowadays, making way for a greater focus on a positioning strategy. The literature which applied game theory to ground warfare includes a strong repository of Weapon-Target Allocation papers (which touch upon Weapon Control Warfare and Resource Allocation Warfare in the modern context), as well as papers that address ancient ground engagements and guerilla warfare. Where human lives are vulnerable, their protection is the most important element of these games, and the next priority is the protection of ground-based assets.

2.2.2. Sea (Naval) Warfare

Given the importance of navies for the projection of power globally, there is a surprising paucity of publicly available literature on naval warfare—with or without the application of game theory. There is often mention of naval warfare in papers dealing with target tracking, but a discussion on military naval strategy is limited to outdated literature or discussion of bare essentials [23,24]. We will review the available papers in this regard, and highlight this as an area where there is a sizeable gap in the literature.

2.2.3. Aerial Warfare

It was not long after the Wright Brothers invented the aeroplane that aerial warfare became a critical factor in combat and military campaigns [25]. In a combat medium rarely impeded by obstacles or dimensions, the nature of aerial combat is fast-paced, intuitive and incredibly treacherous, with unpredictable ‘rules’ for engagement [26,27]. In the present day, the factors to consider are vastly more complicated compared to a century ago, and there is no shortage of resources available to military forces to conduct aerial combat- both human and machine [28,29,30]. The literature shows that as a result of this abundance of arsenal, the intrinsic and potential value of both the targets and the resources used to engage is of particular importance in aerial warfare scenarios. Decisions about those values for both sides of the conflict need to be made when evaluating strategies for combat. As such, several papers deal with the use of game theory in aerial warfare.

2.2.4. Cyber Warfare

Cybersecurity is the protection of IT systems and networks from being damaged/ disrupted/subjected to information theft. Cyberwarfare deals with the concept of Information and Communication Systems being deliberately attacked to obtain a military advantage. While Cybersecurity has been an important field in computer science for many decades now, literature about cyber warfare as such is more scarce, and in any case, heavily overlaps with applications of game theory in computer science in areas related to cybersecurity. This review presents and analyses some papers which are specifically concerned with cyber warfare.

2.2.5. Space Warfare

While the notion of warfare in space has existed for almost a century, neither physical execution nor a body of theoretical strategies for space warfare has been established [31]. Nevertheless, this has not stopped military forces chasing the stars (literally and figuratively) [32,33] and has inevitably led to concepts from game theory being used in space warfare strategic thinking. This is currently mostly limited to satellite networks, where the key parameters of the game are optimised power use and signal strength across the network. The field is still quite young, and further military development in space seems to be inevitable, with which the corresponding literature dealing with applications of game theory in space warfare will also grow.

2.2.6. Mixed/Other Warfare

Several papers address specific niches of defence applications of game theory, and yet cannot be classified as papers analysing a certain type of warfare. In some of these papers, the focus is more on the technology that is used: for example, target tracking. In others, the nature of the hostile actors against whom the defence needs to be conducted changes: for example, national security operations which target domestic terrorism threats rather than an opposing military force. Several papers deal with the use of game theory in such scenarios.

Target tracking systems: Target tracking in the military is the observation of a moving target and the surveillance of its position and manoeuvres [34,35,36]. Success in this domain relies on accuracy in the observed metrics and data, as well as efficient distribution and processing of all collected information [37]. With the advent of intelligent targets, the military must also incorporate predictive methods to maintain ideal tracking performance. The literature reviewed in this regard covers topics from tracking strike missiles to theatre ballistic missiles and tracking unknown intelligent agents to enemy aircraft. Key considerations in this area that shape the games played involve whether or not the target is ‘intelligent’/can take evasive action, whether or not the target will have an optimal trajectory, and whether or not the target will have defenders [38]. Target tracking applications of game theory mostly occur in aerial and naval warfare, including underwater surveillance.

National Security applications: Game theory often finds application in national security and anti-terrorism related fields. This includes predicting and preparing for terrorist attacks, as well as resource allocation scenarios for the protection of key personnel and landmarks/other potential targets for terrorist activity. While the value of potential targets and the likelihood of attacks are obviously key parameters governing the payoff functions of games in this niche, the subsequent social, economic and political ramifications are equally instrumental in modelling games in this area [39,40]. Few military conflicts have as much exposure as those on the home front [41], and the fallout from terrorist attacks and their effect on public mood and confidence in the security apparatus are often taken into account in modelling payoff functions in this area.

2.3. Game Theory

Game theory, which is the study of strategic decision making, was first developed as a branch of microeconomics [6,10,42,43]. However, later it has been adopted in diverse fields of study, such as evolutionary biology, sociology, psychology, political science, project management, financial management and computer science [1,2,5,9,11,44]. Game theory has gained such wide applicability due to the prevalence of strategic decision-making scenarios across different disciplines. Game Theory provides insight into peculiar behavioural interactions like the cooperative interactions within groups of animals [45], the bargaining and exchange in a marriage [46] or the incentivisation of Scottish salmon farmers [47]. A game typically consists of two or more players, a set of strategies available to these players, and a corresponding set of payoff values (also referred to as utility values) for each player (which are usually presented as a payoff-matrix in the case of two-player games) [44,48].

2.3.1. Pure vs. Mixed Strategies

A pure strategy in a game provides a complete definition of how a player will play a game. A player’s strategy set is the set of pure strategies available to that player [10].

A mixed strategy is a combination of pure strategies where a particular probability p (where 0 ≤ p ≤ 1 ) is associated with each of these pure strategies. Since probabilities are continuous, there are infinitely many mixed strategies available to a player. A totally mixed strategy is a mixed strategy in which the player assigns a strictly positive probability to every pure strategy. Therefore, any pure strategy is actually a degenerate case of a mixed strategy, in which that particular strategy is selected with probability 1 and every other strategy is selected with probability 0.

2.3.2. Nash Equilibrium

The concept of Nash equilibrium is fundamental to game theory. It is a state (a set of strategies) in a strategic game from which no player has an incentive to deviate unilaterally, in terms of payoffs. Both pure strategy and mixed strategy Nash equilibria can be defined. A strategic game can often have more than one Nash equilibrium [7]. Every game with a finite number of players in which each player can choose from finitely many pure strategies has at least one Nash equilibrium in mixed strategies, it is proven [7].

The formal definition of Nash equilibrium is as follows. Let ( S , f ) be a game with n players, where S i is the strategy set of a given player i. Thus, the strategy profile S consisting of the strategy sets of all players would be, S = S 1 × S 2 × S 3 … × S n . Let f ( x ) = ( f 1 ( x ) , … , f n ( x ) ) be the payoff function for strategy set x ∈ S . Suppose x i is the strategy of player i and x − i is the strategy set of all players except player i. Thus, when each player i ∈ 1 , … , n chooses strategy x i that would result in the strategy set x = ( x 1 , … , x n ) , giving a payoff of f i ( x ) to that particular player, which depends on both the strategy chosen by that player ( x i ) and the strategies chosen by other players ( x − i ) . A strategy set x * ∈ S is in Nash equilibrium if no unilateral deviation in strategy by any single player would return a higher utility for that particular player [10]. Formally put, x * is in Nash equilibrium if and only if:

∀ i , x i ∈ S i : f i ( x i * , x − i * ) ≥ f i ( x i , x − i * )

2.3.3. Non-Cooperative Games and Cooperative Games

Typically, games are taken to be played for the self-interest of the players, and even when the players cooperate, that is because cooperation seems to them as the best strategy under the circumstances to maximise the individual payoffs of the players. In such games, the cooperative behaviour, if it emerges, is driven by selfish goals and is transient. These games can be termed ’non-cooperative games’. These are sometimes referred to, rather inaccurately, as ‘competitive games’. Non-cooperative game theory is the branch of game theory that analyses such games. On the other hand, in a cooperative game, sometimes also called a coalitional game, players form coalitions, or groups, sometimes due to external enforcement of cooperative behaviour, and competition, if it emerges, takes place between these coalitions [7,8,9]. Cooperative games are analysed using cooperative game theory, which predicts which coalitions will form and the payoffs of these coalitions. Cooperative game theory focuses on surplus or profit-sharing among the coalition [49], where the coalition is guaranteed a certain amount of payoff by virtue of the coalition being formed. Often, the outcome of a cooperative game played in a system is equivalent to the result of a constrained optimisation process [50].

2.3.4. Zero-Sum Games

Zero-sum games are a class of competitive games where the total of the payoffs of all players is zero. In two-player games, this implies that one player’s loss in the payoff is equal to another player’s gain in the payoff. A two-player zero-sum game can therefore be represented by a payoff matrix that shows only the payoffs of one player. Zero-sum games can be solved with the mini-max theorem [51], which states that in a zero-sum game there is a set of strategies that minimises the maximum losses (or maximises the minimum payoff) of each player. This solution is sometimes referred to as a ‘pure saddle point’. It can be argued that the stock market is a zero-sum game. In contrast, most valid economic transactions are non-zero-sum since each party considers that, what it receives is more valuable (to itself) than what it parts with [10].

2.3.5. Perfect vs. Imperfect Information Games

In a perfect information game, each player is aware of the full history of the previous actions of all other players, as well as the initial state of the game. In imperfect information games, some or all players do not have access to the entirety of information about other players’ previous actions [52,53].

2.3.6. Simultaneous Games and Sequential Games

A simultaneous game is either a normal-form game or an extensive-form game where on each iteration, all players make decisions simultaneously. Therefore, each player is forced to decide without knowing about the decisions made by other players (on that iteration). On the contrary, a sequential game is a type of extensive-form game where players make their decisions (or choose their strategies) in some predefined order [52,53]. For example, a negotiation process can be modelled as a sequential game if one party always has the privilege of making the first offer, and the other parties make their offers or counteroffers after that. In a sequential game, at least some players can observe at least some of the actions of other players before making their own decisions (otherwise, the game becomes a simultaneous game, even if the moves of players do not happen simultaneously in time). However, it is not a must that every move of every previous player should be observable to a given player. If a player can observe every move of every previous player, such a sequential game is known to have ‘perfect information’. Otherwise, the game is known to have ‘imperfect information’ [10].

2.3.7. Differential Games

Differential Games are often extensive form games, but instead of having discrete decision points, they are modelled over a continuous time frame [10]. In such games, each state variable evolves continuously over time according to a differential equation. Such games are ideal for modelling rapidly evolving defence scenarios where each player engages in selfish optimisation of some parameter. For example, in missile tracking problems, the pursuer and the target both try to control the distance between them, whereas the pursuer constantly tries to minimise this distance and the target constantly tries to increase it. In such a scenario, iterative rounds of decision making are much too discrete to model the continuous movements and computations of each player. Differential games are ideal to model such scenarios.

2.3.8. Common Interest Games

Common interest games are another class of non-cooperative games in which there is an action profile that all players strictly prefer over all other profiles [52]. In other words, in common interest games, the interests of players are perfectly aligned. It can be argued that common interest games are the antithesis of zero-sum games, in which the interests of the players are perfectly opposed so that any increase in fortune for one player must necessarily result in the collective decrease in fortune for others. Common interest games were first studied in the context of cold war politics, to understand and prescribe strategies for handling international relations [54]. Therefore, it makes sense to classify non-cooperative games into common interest games and non-common interest games, just as much as it makes sense to classify them into zero-sum games and non-zero-sum games, as these two concepts (zero-sum games and common interest games) represent extreme cases of non-cooperative games.

2.3.9. Signaling Games

A signalling game [52] is an incomplete information game where one player has perfect information and another does not. The player with perfect information (the Sender S) relays messages to the other player (the Receiver R) through signals, and the other player will act on those signals after inferring the information hidden in the messages. The Sender S has several potential types, of which the exact type t in the game is unknown to the Receiver R. t determines the payoff for S. R has only one type, and that payoff is known to both players.

The game is divided into the sending stage and the acting stage. S will send one of M = < m 1 , m 2 , m 3 , … , m j >messages. R will receive that message and respond with an action from a set A = < a 1 , a 2 , a 3 , … , a k >. The payoff that each player receives is determined by the combination of the Sender’s type and message, as well as the action that the Receiver responds with. An example of the signalling game is the Beer-Quiche Game [52], in which Player B, the receiver, chooses whether or not to duel Player A. Player A is either surly or wimpy, and Player B would only like to duel the latter. Player A chooses to have either a Beer or a Quiche for breakfast. While they prefer a quiche, a quiche signals information from a stereotype that quiche eaters are wimpy. Player B must analyse how each decision, duel or not duel, will give them a better payoff depending on which breakfast Player A chooses.

2.3.10. Behavioural Game Theory

Behavioural Game Theory combines classical game theory with experimental economics and experimental psychology, and in doing so, relaxes many simplifying assumptions made in classical game theory which are unrealistic. It deviates from simplifying assumptions such as perfect rationality [55], the independence axiom, and the non-consideration of altruism or fairness as motivators of human decision making [56,57]. We will show in this review that the approaches related to behavioural game theory are crucial in modelling military scenarios, such as in signalling games.

2.3.11. Evolutionary Game Theory

Evolutionary game theory is an outcome of the adoptation of game theory into the field of evolutionary biology [58]. Some of the critical questions asked in evolutionary game theory include: which populations/strategies are stable? which strategies can ‘invade’ (become popular) in populations where other strategies are prevalent? How do players respond to other players receiving or perceived to be receiving better payoffs in an iterated game setting? etc. Evolutionary games are often modelled as iterative games where a population of players play the same game iteratively in a well-mixed or spatially distributed environment.

A strategy can be identified as an evolutionary stable strategy (ESS) if, when prevalent, it has the potential to prevent any mutant strategy from percolating its environment. Alternatively, an ESS is the strategy that, if adopted by a population in a given environment, cannot be invaded by any alternative strategy. Hence, there is no benefit for a player to switch from an ESS to another strategy. Therefore, essentially, an ESS ensures an extended Nash equilibrium. For a strategy S 1 to be ESS against another ‘invading’ strategy S 2 , one of the two conditions mentioned below needs to be met, in terms of expected payoff E.

E ( S 1 , S 1 ) > E ( S 2 , S 1 ) : By unilaterally changing strategy to S 2 , the player will lose out against another player who sticks with the ESS S 1 .

E ( S 1 , S 1 ) = E ( S 2 , S 1 ) & E ( S 1 , S 2 ) > E ( S 2 , S 2 ) : a player, by converting to S 2 , neither gains nor loses against another player who sticks with the ESS S 1 , but playing against a player who has already ‘converted’ to S 2 , a player is better off playing the ESS S 1 .

If either of these conditions are met, the new strategy S 2 is incapable of invading the existing strategy S 1 , and thus, S 1 is an ESS against S 2 . Evolutionary games are typically modelled as iterative games, whereby players in a population play the same game iteratively [52].

2.3.12. Other Recent Advances in Game Theory

It should be noted that there are several other branches of game theory that were not mentioned in the above sub-sections, and there have been also several recent advances that have not been mentioned. Game theory is used in increasingly more diverse scenarios and applications. For example, game theory has been used to determine the market share of competitors in the telecommunication industry [59], or implementation and construction of biogas plants [60]. In some applications, payoffs of matrix games are constructed to contain fuzzy elements, which, it is argued, makes the modelled scenarios more realistic [61,62]. Similarly, quantum game theory is an emerging field [63,64], which introduces superposed initial states, quantum entanglement of initial states, and superposition of strategies. Not all such advances can be summarised here. Therefore, this section has provided a basic introduction to only those game-theoretic concepts which are often used in the defence literature, and particularly in the papers that we review. Therefore, to the reader unfamiliar with game theory, the above subsections presented an elementary introduction. Please see [10,52] for more elaborate treatments of the concepts presented.

With this background, we now review the available literature which deals with the applications of game theory in defence science and technology.

3. Use of Game Theory in Defence Science and Technology Applications

As mentioned earlier, the primary parameters that influence the payoff matrix in games modelling defence scenarios are the value of targets, the value of resources and the priority of objectives. Other than this, the games used in defence applications can vary greatly, as we will see below. For this reason, this section is structured based on the domain (type of warfare) each paper covers. Where a paper covers more than one domain, it is included in the most relevant subsection/domain. We however analyse in detail the type of games used, the way the payoff functions were structured, the available strategies and equilibria etc for each paper.

3.1. Papers Dealing with Land Warfare

In Land warfare related applications of game theory, most studies focus on defensive warfare, whereby the military makes decisions on how to best allocate their ground defences to multiple threats. Some studies also focus on historical land-based conflicts and provide game-theoretical analysis in hindsight, revealing how some decisions made from intuition in historical conflicts had a rational and mathematical justification. Land warfare can result in very heavy casualties, so understanding how to best minimise human losses is a key component (though not the only objective) of land warfare. Quite often, prioritising military resources is also fundamental to success and often features prominently in strategic decisions. Furthermore, often in scenarios involving ground warfare, it is important to assess the knowledge about opponents, their possible tactics, or terrain: it may become necessary to combat airborne forces being inserted at certain places, or it may be needed to traverse uncertain territory. In each of these situations, understanding where a force has imperfect information will help that force to make rational decisions.

Several papers use game theory to model land warfare in current and historical contexts. Bier et al. [65] design a game to best assign defensive resources to a set of locations/resources that need to be protected. The attacker must then decide how they choose to split their force to attack the different targets. The game is modelled as a two-player game of normal form. The payoff in this game is absolute, and an attack on a location i is either a success or a failure, where the attacker gains a i and the defender loses d i . Since orders for an attack are confirmed ahead of an attack, attackers must use a set of pure strategies. The game can be played both simultaneously or sequentially. That is, the game can be played depending on whether or not the attacker knows how the defender has assigned their resources before making their decision. This leads to the ideal strategy of leaving some targets undefended and strengthening defences in key areas by leaving some areas vulnerable.

The next paper we review is Gries et al. [66] which is a comprehensive investigation into the utility of game theory principles in guerilla/destabilisation warfare. The significant factors they model are: destabilisation insurgents often attack randomly, creating a continuous threat that must have a continuous mitigation and detection strategy; the duration of a war is important to consider, and will change the value that is assigned to targets and assets; time preferences play a critical role in setting priorities, as judgements of value determine strategic decisions which in turn determine success or failure. The game model they propose involves both a sequential non-cooperative game and a simultaneous non-cooperative game, in each of which the two players are the guerilla force and the government. For these conflicts, the economic and social impacts are much more significant than military losses and gains and therefore play a much more significant role in calculating the value of outcomes.

The game specifically models moments when each side looks to try and find peace or conflict with the other. At each of these moments, the government forces must consider the financial cost of each option, while the rebels examine the order of priority of the engagements, and what portion of their fighting force they will make available for each engagement. Figure 1 demonstrates an example of the decision tree to emerge from these moments in destabilisation warfare, where G represents the Government decisions and R represents the Rebel decisions.

An external file that holds a picture, illustration, etc. Object name is sensors-22-01032-g001.jpg

Destabilisation Warfare game [66], where the government and rebel decision points are highlighted.

Krisnamurthy et al. [67] investigate game-theoretic control of dynamic behaviour of an unattended ground sensor network (UGSN) to acquire information about intruders. Each sensor in this network is capable of receiving measurements, with specified accuracy, of the range and bearing of nearby targets which they then transmit to a local hub for data fusion. In this framework, while more sensor measurements and larger volumes of transmission of measurements may lead to better target awareness, this also results in the undesirable effect of greater consumption of limited battery power. Hence, the goal to which game theory is applied is to optimally trade-off target awareness, data transmission and energy consumption using a two-time scale, hierarchical approach.

The authors demonstrate that the sensor activation and transmission scheduling problem can be decomposed into two coupled decentralized algorithms. In particular, the sensors are viewed as players in a non-cooperative game and an adaptive learning strategy is proposed to activate the sensors according to their proximity to targets of interest. This turns out to be a correlated equilibrium solution of this non-cooperative game. Next, the transmission scheduling problem, in which each sensor has to decide at each time instant, whether to transmit data and waste battery power or to wait and increase delay, is formulated as a Markov Decision process with a penalty terminal cost. The main result of this formulation is to show that the optimal transmission policy has a threshold structure which is then proved using the concept of supermodularity.

There are several studies that analyse historical conflicts, which occurred predominantly on land, using a game-theoretic prism. For example, Cotton and Liu [68] describe two ancient Chinese military legends and model them as signalling games. In both games, legendary military leaders are faced with formidable opposing armies with much greater numbers and strength than their own, but instead of retreating, they prepare to engage, acting as if they are setting up for an ambush. Their opponents with imperfect information are left only with the messages they can infer from their opponents’ actions; spooked by the perceived confidence and the reputation that these generals carried, the opposing armies, though in actuality are of superior strength, choose not to engage. Through a brave and ingenious bluff, both generals achieve an equilibrium solution in their favour by standing their ground. They do this by creating deception without direct communication, which follows the template of the aforementioned Beer-Quiche signalling game.

The first game described by Cotton and Liu is the “100 Horsemen” game. They describe a piece of history where a hundred Han horsemen travelling alone encounter a large Xiongnu force numbering in the thousands. Their available strategies are to retreat or engage. If they retreat, and the enemy engage, they will very likely be run down and defeated; if they engage and the enemy also engage, they will be eliminated in battle. The best outcome for them is to somehow force an enemy retreat. The enemy is uncertain if the horsemen are travelling with a greater army. They see the horsemen move to engage, and decide not to take the risk, and retreat. The situation is translated into a two-player game, with two strategies. It is represented in the Figure 2 below:

An external file that holds a picture, illustration, etc. Object name is sensors-22-01032-g002.jpg

The 100 Horsemen signaling game [68].

LG represents the decision point for General Li Guang, of the Han forces.

GenX represents the decision point for the opposing Xiongnu force.

Payoffs are listed as (LG, GenX)

λ ∈ ( 0 , 1 ) represents the Generals’ ability, as either strong or weak

α and β represent the proportion of Han horsemen killed in a retreat

w is a positive parameter

The second game is very similar to the first. In this game, a small city is guarded by the formidable General Zhuge Liang. He learns that a great hostile army is approaching the city. He is faced with two options. He could run, after which he would secede the city and likely be chased down by the approaching army, or he could stay and defend the city. If he chose the latter, and the army were to engage, he would likely lose his life, his army and the city. Faced with this dilemma, he orders his men to hide out of sight, so that the city appears empty from the outside. He climbs to the top of the foremost tower of the city and plays music. The opposing general, aware of General Liang’s experience and prowess, suspects that the General has taken this unassuming position in the tower in the empty city to ambush his army, and they move away from the city to avoid being ambushed. General Liang sent effectively two signals here. The first was his reputation, a signal encompassing his strategic and military strength. The second was his choice to stay and defend the city. With these two pieces of information, and nothing else about the whereabouts or magnitude of General Liang’s army, the opposing army chooses the safe option of zero loss and leaves. This piece of history is modelled as another two-player signalling game, shown in Figure 3 below:

An external file that holds a picture, illustration, etc. Object name is sensors-22-01032-g003.jpg

The Empty City signaling game [68].

ZL represents the decision point for the General Zhuge Liang

Payoffs are listed as (ZL, Opposing Army)

λ ∈ ( 0 , 1 ) represents the Generals’ ability, as either strong or weak

c represents the value of the city

w represents the gains if ZL’s army matched the opposing army

y represents the losses if ZL’s army is weaker than the opposing army, and y > c since it encompasses losing the city

Both pieces of history represent distinguished military decision making in the face of near-certain defeat, and are in fact examples of Generals with a strong understanding of the nuances of signals and rational decision making in strategic interactions forcing an outcome that is favourable to themselves.

3.2. Papers Dealing with Naval Warfare

Surprisingly, papers that directly and primarily deal with naval warfare by using game theory are comparatively rare, even though naval warfare predates aerial warfare in human history by a considerable margin. Levin [69] studies aspects of naval warfare in the previous centuries using concepts from game theory. In the 18th and 19th centuries, the powerful nations of the time built warships with cannons positioned along their sides. It meant that ships could attack typically only to their sides. When sailing as an armada, the standard approach was to form a ‘line of battle’ i.e., a column of allied naval ships sailing in a direction such that their sides would face the enemy, also positioned in a line. The two parallel opposing fleets could then attack each other with a large number of cannons. The ‘line of battle’ strategy is considered to be a Nash-equilibrium because neither fleet would gain from performing raking (a tactic of the era, whereby an attacking ship would attempt to sail across its adversary’s stern, concentrating cannon fire there while the enemy could only respond minimally due to having fewer cannon placements in the stern. The attacking ship would damage both the stern and some of the broadsides of its adversary). According to Levin [69] raking was not preferred in a fleet as this would mean having to first sail ahead of its enemy and then turn towards it—a challenging task when the ships’ speeds were roughly equal and manoeuvring was difficult. As neither fleet would gain from turning towards the enemy and neither would get ahead, Levine concludes that this strategy—forming a line of battle and sailing parallel to the other fleet—was each fleet’s best response, and thus represented a Nash equilibrium.

Levine goes on to mention battles in which English fleets deviated from the above strategy and sailed orthogonally towards a French and Franco-Spanish fleet. In the first battle Levine mentions, it was likely unplanned. In the second—the 1805 Battle of Trafalgar—it was by careful design: the English fleet divided itself into two columns, each of which sailed orthogonally to the Franco-Spanish line, raking fire for about 45 minutes before crashing through it and beginning a general melee. The English would go on to isolate the middle of the Franco-Spanish fleet to score a decisive victory. Levine considers both battles to be counterexamples to his thesis. However, in the Battle of Trafalgar, it is possible that the English strategy was a best response to the likely Franco-Spanish strategy of forming an orthodox line of battle. The English admiral, Lord Nelson, desired to keep the Franco-Spanish fleet from escaping—which they could if both fleets formed parallel lines of battle—thus reducing the reward he would get for forming his own fleet into a line of battle. Moreover, he may have estimated that the poor gunnery of the French and Spanish ships would lessen the effect of the raking fire, thus reducing the negative reward he would get for directly charging the Franco-Spanish fleet. In his eyes, this may have made the unorthodox option a better response to the likely Franco-Spanish strategy than the orthodox line of battle. While Levine did not explicitly attribute these strategies in naval battles of the era to game theory, the adopted strategies could nonetheless be justified by game-theoretic analysis: an example of ’intuitive’ application of game theory without formally studying it.

Maskery et al. 2007 (a) [70] study the problem of deploying counter-measures against anti-ship missiles using a network-enabled operations (NEOPS) framework, where multiple ships communicate and coordinate to defend against a missile threat. Here, the missile threats are modelled as a discrete Markov process and they appear at random positions within a fixed physical space and move towards the ships obeying some known target dynamics and guidance laws. The ships which are equipped with counter-measures (CM) such as decoys and electromagnetic jamming signals are modelled as the players of a transient stochastic game, where the actions of the individual players include the use of CM to maximize their own safety while cooperating with other players which are essentially aiming to achieve the same objective. The optimal strategy of this game-theoretic problem is a correlated equilibrium strategy and is shown to be achieved via an optimization problem with bilinear constraints. This is in contrast to the Nash equilibrium solution proposed tepmaskery2007decentralized to a related problem but one without player coordination. A noteworthy contribution of this paper is that it also quantifies the amount of communication necessary to implement the NEOPS equilibrium strategy. This paper highlights the utility of game-theoretic methods in analysing optimal strategies in network-enabled systems which are critical in modern warfare.

In [71], Maskery et al. 2007 (b) consider the problem of network-centric force protection of a task group against anti-ship missiles. The decision-makers in this model are the ships equipped with hard-kill/soft-kill weapons (counter-measures) and these ships are also considered the players in the formulation of this problem in a game-theoretic setting. The platforms must make critical decisions independently on the optimal deployment of the counter-measures while they simultaneously work towards a common goal of protecting the members of the task group. Essentially this is a decentralised missile deflection problem in a naval setting which is formulated as a transient stochastic game for which the ships may compute a joint counter-measure policy that is in Nash equilibrium. Here, the ships play a game with each other instead of with a missile. This approach naturally lends itself to decentralised solutions which may be implemented when full communication is not feasible. Moreover, this formulation leads to an interpretation of the problem as a stochastic shortest past game for which Nash Equilibrium solutions are known to exist.

Bachmann et al. [72] analyse the interaction between radar and jammer using a noncooperative two-player, zero-sum game. In their approach, the radar and jammer are considered ‘players’ with opposing goals: the radar tries to maximize the probability of detection of the target while the jammer attempts to minimize its detection by the radar by jamming it. Bachmann et al. [72] assume a Swerling Type II target in the presence of Rayleigh distributed clutter, for which certain utility functions are described for cell-averaging (CA) and order-statistic (OS) CFAR processors in different cases of jamming. This game-theoretic formulation is solved by optimizing these utility functions subject to constraints in the control variables (strategies), which for the jammer are jammer power and the spatial extent of jamming while for the radar the available strategies include the threshold parameter and reference window size. The resulting matrix-form games are solved for optimal strategies of both radar and jammer from which they identify conditions under which the radar and jammer are effective in achieving their individual goals.

3.3. Papers Dealing with Aerial Warfare

Aerial combat is often a normal-form game where decisions about utilised resources are made before the engagement, based on assumptions and knowledge about the strength of different elements of the arsenal. For example, Suppression of Enemy Air Defense vehicles (SEADs) are effective against ground-to-air defences and Surface to Air Missiles (SAMs), but will not be useful against fighter aircraft. Therefore, when military personnel decide which resources to use in an engagement, they need to weigh how valuable each of their resources is, as well as how important the objective is to both sides of the conflict. If the attacking force values a target much more than it is actually worth, then their increased resource expenditure may be detrimental to their military campaign as a whole. With humans operating the aerial weaponry usually, their respective abilities and skillsets, and the likelihood of them executing their mission, need to be considered.

There is limited literature on aerial combat modelled with game theory. Hamilton [73] provides a comprehensive guide to applications of game theory on multiple Aerial warfare situations. Hamilton suggests using game theory to devise strategies not only based on one’s own military options but also expectations around enemy actions as well. Game theory accounts for different interactions with the enemy, rather than simply considering which side had superior maximum-effort power. Nowadays, many military forces can adapt to instantly changing situations and adjust their actions based on those new circumstances. As such, Hamilton suggests first determining all of the tactical options available to each side. As stated earlier, one of the most fundamental elements of using game theory for the military is understanding exactly how much value each asset holds—and detailing the inventory and strategic possibilities of both sides will best clarify all strategic options. For each option, Hamilton suggests assigning a numerical value—a Measure of Effectiveness (MoE). Decisions about MoEs are important because being accurate with MoEs will underpin the choices that are made strategically. Incorrect MoEs can lead to incorrect strategic decisions, and perhaps also result in poor understanding of why the decision was wrong. An example of this (although not in the aerial warfare context) was the Vietnam War, where the early US strategy was maximising the neutralisation of Viet Cong soldiers. Since the Northern Vietnamese leadership did not place great emphasis on their infantry, the US strategy ultimately led to a loss in the war. Next, Hamilton suggests calculating the combined value for all possible interactions between strategies of both sides of the conflict. This will generate a matrix of payoffs, from which it is possible to derive the optimum or dominant strategy for each player, and then an equilibrium solution. Thus, ahead of any engagement a military leader may partake in, they have a well-formed idea of the expected result of the game. A caveat that Hamilton adds to these guidelines is to consider the length of a military campaign as a whole. The values that can be assigned to a resource for one battle or strike attack may be small if they are cheap to replace or large in number. However, depending on the number of such skirmishes throughout a campaign, those resources may become pivotal.

To illustrate these points, Hamilton applies them to a standard aerial warfare game of SEADs and time-critical targets. In this combat, the ’Blue’ side is trying to eliminate some ground-based targets. To do this, they use SEADs. In response, the ’Red’ side will fire SAMs, which SEADs struggle to avoid. However, expecting this response, the Blue side also has Strike aircraft which can defend the SEADs and counteract the SAMs but are unable to attack the targets. The questions for the Blue team are: what is the value of the target and what ratio of SEADs and Strike aircraft should be deployed for the targets? Likewise, for the Red team: how valuable is the target and how many, if any, SAMs should be fired? Hamilton contends that the optimal Red strategy is to fire only for a fraction of engagement which is equal to:

Value of Target Value of Target + Value of SEAD × P k s + Value of SAM × P k A

and the optimal Blue Strategy is to assign a fraction of the planes as SEADs which is equal to:

Value of SAM × P k A Value of SAM × P k A + Value of SEAD × P k s + Value of Target

P k s is the probability of the SAMS destroying the SEADs

P k A is the probability of the Strike aircraft destroying the SAMs

This formulation gives a concise prediction of the likely outcome of an engagement given every possible assignment of aircraft and missile launches. It must be noted that it is incredibly difficult in practice to accurately quantify the numerical value of different targets and resources.

Garcia et al. 2019. [74] investigate the problem of defending a maritime coastline against two enemy aircraft whose main objective is to invade the territory controlled by the defending aircraft. The defender, on the other hand, attempts to prevent this by trying to intercept both enemy aircraft in succession as far as possible from the border. This is a typical pursuit-evasion scenario and is representative of many important problems in robotics, control and defence. In this paper, Garcia et al. formulate this problem as a zero-sum differential game, where the defender/pursuer tries to successively capture the two attackers/evaders as far as possible from the defended coastline while the attackers cooperate and minimize their combined distance from the border before they are confronted. Garcia et al. then find the optimal strategies for the attackers and the defender in this one-defender two-attacker pursuit-evasion game by solving a set of nonlinear equations. The cooperative strategy discussed in this paper provides an important coordination approach for less capable (perhaps slower) agents when they are tasked to carry out a mission.

Garcia et al. 2017 [75] consider an air combat scenario where a target aircraft that is engaged by an attacking missile utilises a defending missile to defend itself as it attempts to escape the attacker by maximising the distance between itself and the attacker when the defender reaches as close at it can to the attacking missile. The game is referred to as an active target defence differential game (ATDDG). In the paper, the authors extend previous works performed on this three-party problem to develop a closed-form analytical solution for the ATDDG where the Defender missile can defeat the attacker if it enters within a capture circle with a specified radius rc > 0. Additionally, the closed-form optimal state feedback solution demonstrated in the paper is supposed to work despite the attacker employing an unknown guidance law rather than assuming it is Proportional Navigation (PN) or pursuit (P). Finally, the authors provide the set of initial conditions for the target aircraft where its survival is guaranteed if the target-defender team plays optimally despite the unknown guidance law employed by the attacking missile.

Deligiannis et al. [76] consider a competitive power allocation problem in a Multiple-input multiple-output (MIMO) radar network in the presence of multiple jammers. The main objective of the radar network is to minimize the total power emitted by the radar while achieving a specific detection criterion for each of the targets. In this problem, the radars are confronted by intelligent jammers that can observe the radar transmitted power and thereby decide their jamming power to maximize interference to the radar. Here Deligiannis et al. treat this power allocation problem as a non-cooperative game where the players are the central radar controller and the jammers and solve this using convex optimization techniques. Moreover, they provide proof for the existence and uniqueness of the Nash equilibrium in this scenario, where no player can further profit by changing its power allocation.

Similarly, He et al. [77] consider the radar counter-measure problem in a multistatic radar network, where a game-theoretic formulation of joint power allocation and beamforming is studied in the presence of a smart jammer. The goal of each radar in this network is to meet the expected detection performance of the target while minimizing its total transmit power and mitigating the potential interferences. On the other hand, the goal of the jammer is to adjust its own transmit power to interfere with the radar to protect the target from detection. First, He et al. study the power allocation game with strategy sets of each player (radar and jammer) consisting of their respective transmit powers. They then proceed to solve the corresponding optimization problems to work out the best response function for the radar and the jammer and show the existence and uniqueness of the Nash equilibria. Next, they consider the joint power allocation and beamformer design problem in the presence of jammers again as a non-cooperative game and propose a power allocation and beamforming algorithm which is shown to converge to its Nash equilibrium point.

McEneaney et al. [78] investigate the command and control problem for unmanned air vehicles (UAVs) against ground-based targets and defensive units such as Surface-to-Air Missile (SAM) systems. The motivation for this work arises from the requirement for operations planning and real-time scheduling in an unmanned air operations scenario. The problem is modelled as a stochastic game between blue players (UAVs) and red players that comprise the SAMs and ground-based targets. There can be a number of objectives for each side: for an example, a blue player may try to destroy a strategic target while minimizing damage to itself. The red players, on the other hand, may attempt to inflict maximum damage on the UAV while protecting themselves from attack by the UAVs.

The control strategies for the UAVs consist of a set of discrete variables that correspond to the specific target or SAM to attack, while that for the SAMs are to switch their radar “on” or “off”. Note that when the radar is “on”, the probability of the SAM causing damage to the blue players’ increases as does the probability of the blue players inflicting damage to the SAM. The solution to this stochastic game is obtained via dynamic programming and illustrated using some numerical examples. The main contribution of this work is the analysis of a risk-sensitive control-based approach for stochastic games under imperfect information. In particular, this approach not only handles noisy observations due to random noise but also deals with cases that include an adversarial component in the observations.

Wei et al. [79] have developed a mission decision-making system for multiple uninhabited combat aerial vehicles (UCAVs) working together. The UCAVs weapons are air-to-air missiles. In the paper, a red-UCAV team consisting of an unmanned fighter-bomber flanked by two UCAVs attempts to strike a blue-team ground target. The blue team has its own set of UCAVs that are directed to defeat the red team. The success of a given missile against its chosen threat is determined by the distance between the attacker and threats, their relative speed, and relative angles. The scenario is represented as a simultaneous normal form game with the strategies for the team corresponding with allocations of blue team entities against red-team entities and vice versa. In the paper, the payoff for a red or blue team is based upon considering the effectiveness of a given allocation, which in turn is dependant upon the relative geometry between the opposing team allocation groupings. Dempster–Shafer (D-S) theory is applied where the D-S combinatorial formula is harnessed to formulate the payoff. These payoffs, calculated for each strategy, for each team is then placed into bi-matrices, i.e., one for each team and solved using a linear programming optimisation approach. If an optimal Nash equilibrium is not present, a mixed strategy approach is applied and solved. The authors then develop some mission scenarios with differing geometries and illustrate the use of their game-theoretical allocation strategy. They use annotated diagrams of entity geometry containing red and blue teams in proximity to one another to demonstrate that the allocation strategy determined by their payoff formulation is satisfactory.

Ma et al. [80] have developed a game-theoretic approach to generate a cooperative occupancy decision-making method for multiple unmanned aerial vehicle (UAV) teams engaged against each other in a beyond-visual-range (BVR) air combat confrontation. BVR combat is enabled because of developments in missile technology enabling long-range engagements. In the paper, the team on each side first decides the occupancy positions (cubes in Cartesian space) of its UAV entities followed by selecting targets for each UAV team member to engage. The goal is for each side to obtain the greatest predominance while experiencing the smallest possible threat condition. A zero-sum simultaneous bi-matrix game is applied to analyse the problem. For a given occupancy of a UAV, height and distance predominance formulae that factor in the range and weapon minimum/maximum performance criteria are used to generate payoff values for the utility functions. As the scale of the game leads to an explosion in size (and thus strategy) as the number of occupancy cubes and UAVs for each team is increased, the authors have chosen to augment the Double Oracle (DO) algorithm that was designed to solve large scale zero-sum game problems in earlier works, by combining it with a Neighbourhood Search (NS) algorithm into a Double Oracle Neighbourhood Search (DO-NS). Through simulations, the authors illustrate that the results show the DO-NS algorithm outperforming the DO algorithm in terms of computational time and solution quality.

The work of Başpınar, Barış et al. [81] focuses on the modelling of air-to-air combat between two unmanned aerial vehicles (UAVs) using an optimization-based control and game-theoretic approach. In this work, vehicle motion is expressed in terms of specific variables and any trajectory planning for moving from one waypoint to another is solved by determining the smooth curves satisfying defined conditions in flat output space. Following determination, all variables involved to describe the smooth curve can be reverted to the original state/input space. The impact is a speed-up in the solving of any trajectory optimization through reducing the number of variables required. Game theory is then harnessed where the aerial combat between the two UAVs is modelled as a zero-sum game using a minimax approach. That is, each party tries to maximise its payoff when the opponent plays its best strategy. Here, The objective is for each UAV to get directly behind the other party and within a range threshold to satisfy onboard weapon effective range constraints.

In [81], the authors provide cost functions associated with the degree of being in tail-chase to the target based on aspect and bearing angles as well as the cost functions associated with generating a maximum score when the opponent is within some threshold of the optimum shooting range. The cost functions are multiplied together to create the total cost. The cost functions are put into a receding horizon control scheme where the trajectory planning determined through selection of the controls is performed for a given look-ahead time period where both players are utilising opposite strategies. Each player considers its opponents as reachable sets within the horizon and uses this to select its choice of controls to maximise its payoff. This process is repeated every few control steps. Unlike most other works in this area, the authors use the full set of control inputs within the performance envelope rather than a subset (e.g., turn, maintain hading, roll left at a particular angle, immelman, split S or spiral dive), and thus point to generating a more optimal solution for each player’s strategy. Two simulation scenarios are provided, with the first being the case where neither UAV starts off in air-superiority position and then exercises the receding horizon cost function optimisation to get into tail-chase with its opponent within optimum firing range. The authors show that the speed, load factor and bank-angle when applying the controls do not violate bounds during the flights and that feasible trajectories are generated. For the second simulation, the UAVs are initially in a tail-chase except not satisfying the within shooting range criterion. The opponent being chased manoeuvres to escape by applying the cost function while the chaser continues chasing. At the end of the engagement, within shooting range criteria are met and the target is directly in front but at a sub-optimal aspect, which leads to its escape. These scenarios are used to demonstrate the validity of the control strategy developed and thus provide the automatic selection of combat strategy for two unmanned aerial vehicles engaged in combat against one another.

Casbeer et al. [82], consider a scenario where an attacker missile pursuing an unmanned aerial vehicle target is engaged by two defending missiles launched from entities allied to the target which cooperate with the target. It extends from the typical three-party game scenario where there is only one single defending missile engaging the attacker cooperating with the target. The author refers to it here as an Active Target Defence Differential Game (ATDDG). Besides computing the optimal strategy for the players in the extension to ATDDG, the paper attempts to determine the degree of reduction in vulnerability of the target when it uses two defenders rather than one. A constrained optimisation problem is formulated to solve this scenario. It is shown that the target through having the choice to cooperate with either defender can more successfully escape the attacker. Additionally, the presence of two defenders enables the attacker to be more easily intercepted. When the two defender missiles are well-positioned, both can intercept the attacker.

Han et al. [83] present an Integrated Air and Missile Defence (IADS) problem where Surface-to-Air-Missile (SAM) batteries equipped with Interceptor Missiles (IM) engage the Attacker Missiles (AM) targeting cities. The problem is cast as a simplified two-party zero-sum game with perfect information and has three stages. The three stages correspond with the defender setting up their allocation of SAMs to cities, followed by the attacker allocating their missile salvo against cities and finally the defender in response allocating interceptor missiles to counter attacker missile salvo. The simplifying assumptions made in this problem are that there is only one SAM allocated near a city and only one installed per site. Additionally, no more than one interceptor is launched against each attacking missile. Additionally, only one IM can be allocated to one DM, each SAM has the same number and type of IMs, and AMs are identical and are fired in a single salvo. It is attempted to solve the tri-level game using an extensive form game tree, α − β pruning and using the Double Oracle (DO) algorithm for a six-city network that needs to be protected. The DO algorithm is a heuristic and is not guaranteed to find the Subgame-Perfect Nash-Equilibrium (SPNE). The efficiency with which the Subgame-Perfect Nash equilibrium is reached by each choice of algorithm is studied. For the game-tree approach, the conclusion made is that the size of the strategy space is determined to increase to an intractable size because of the combinatorial nature of the problem. When applying α − β pruning, compared to the DO algorithm, the paper determines that determination of the number of SAM batteries, AMs and IMs do not scale well in terms of computational time. However, the DO algorithm does fail to find the SPNE in a small number of instances. The authors prefer the DO algorithm despite this, as it is shown to not violate monotonicity (increase in payoff) and the solution quality trends (non-exponential increase in computational time) even when increasing the size of the problem from 6 cities to 55 cities.

3.4. Papers Dealing with Cyber Warfare

Papers that deal with applications of game theory in Cyber Warfare, as distinct from Cyber Security, are few. Significantly among them, Keith et al. [84] consider a multi-domain (cyber combined with air-defence) defence security game problem. Two players are engaging each other in a zero-sum extensive form game, a defender, representing an integrated air-defence system (IADS) equipped with cyber warfare protection, and an attacker, capable of unleashing air-to-ground threats (missiles, bombs) as well cyber-attacks (against the IADS network). Here, the payoff has been selected as the expected loss of life. The defender wants to minimise this while the attacker wants to maximise it. The cyber security game problem to protect the IADS is nested within the physical security game problem. The actions of the players correspond with allocations to activate IADS/cyber security response nodes corresponding with population centres for the defender, and allocations to attack IADS/associated cyber-security nodes by the attacker. The realism of the game is increased through provisioning in imperfect information; that is, the defender and attacker are not fully aware of the level of vulnerability of nodes. Additionally, the defender is only able to sense cyber attacks on nodes probabilistically which implies that its allocation of cyber defence teams to particular IADS is only effective probabilistically. For the attacker, it can also determine the effectiveness of its cyber-attacks following physically attacking a node. This work is primed to advance the security game literature by introducing the integrated domain, multiple periods for agent actions and enabling continuously mixed form strategies by the players. The author considers it the first work where Monte Carlo (MC), and discounted and robust Counterfactual Regret Minimisation (CRM)-based approaches have been compared in security games. Initially, for a small-scale version of the problem, the Nash equilibrium (NE) in the form of a sequence-form linear program is determined for the defender. Then, the problem is gradually scaled up to include additional population centres to be defended up to an upper limit. Here, an approximate CRM algorithm is introduced to reduce computation time while preserving the optimality of a particular strategy as much as possible. When the scale is further increased, a discounted CRM is introduced which further reduces computation time.

The parameter space of the problem and algorithm are explored to select the best choice of tuning parameters and extract the best performance from the algorithms. The rationality of the players is made limited by introducing bounded rationality so that they do not necessarily make the optimal responses. They can only manage approximate robust best response moves. A robust best response for a player is defined as the compromise between the completely conservative NE strategy and the completely aggressive best response strategy. It introduces weaknesses in players’ strategies. With respect to a player, the capability of their strategy to capitalise over an opponent’s strategy is referred to as exploitation. Conversely, the vulnerability of their strategy with respect to an opponent is referred to as exploitability. When running all the different algorithms introduced, the results show that the Nash equilibrium solution is the safest strategy since the best moves are being played which are not exploitable, however, it does not produce the highest utility for a player. The performance charts show that the robust linear program generates the highest mean utility and highest exploitation-to-exploitability ratio while also consuming the maximum computational time. The Data biased CFR is seen to offer the best trade-off by offering a high mean utility, an exploitation-to-exploitability ratio in favour of exploitation while running in the lowest computational time.

3.5. Papers Dealing with Space Warfare

In the domain of space warfare, human resources and risks are much less prevalent, and so the focus is more on network strength and interaction between independent autonomous agents, connected or otherwise. Ultimately, warfare in these aspects will operate at a pace and in dimensions far beyond human cognitive capacity. Since the rapidity and complexity of decisions within engagements will almost certainly outscale military personnel’s understanding, the game theory will take the place of decision-makers as part of the overall software and control system, and imbue future technology to consider human/social factors when making calculations. With a greater focus on connectivity and networking, the key to success in these areas relies on effective communication channels and a shared goal across the system. In this nascent area of research, papers that apply game theory are often concerned with satellite networks.

Zhong et al. [85] set the ambitious goal of optimising bandwidth allocation and transmission power across a satellite network. They base their research on bargaining game theory and have to achieve compromise across interference constraints, Quality of Service requirements, channel conditions, and transmission and reception capabilities for satellites at every point in the network. Interference constraints and bandwidth limitations are the surpluses that need to be negotiated in the bargaining game, with each satellite using different strategies to improve its utility/share of resources. This quickly escalates in complexity, with the most important takeaway from the model being the mapping of a problem to the cooperative bargaining game framework.

Similarly, Qiao and Zhao [86] detail some key issues with the finite energy availability for the nodes in satellite networks. Their paper offers a solution through a game-theoretical model of a routing algorithm and uses it to find an equilibrium solution to the uneven network flow. The model locates certain network hot spots, which are reserving a lot of energy and takes measures to evenly distribute the resources. This is another case of a bargaining/cooperative game across multiple players in a network.

3.6. Papers Dealing with Target Tracking

Since target tracking is an established research area, we found several papers applying game theory tracking problems. Most of these have overlapping warfare domains and do not put too much emphasis on demonstrating applicability in a particular domain. For example, Gu et al. [87] study the problem of tracking a moving target using a sensor network comprising of sensors capable of providing some position-related target measurements. Each sensor node has a sensor to observe the target and a processor to estimate its state. While there is some communication available among sensors, this ability is limited in the sense that each sensor node can only communicate with its neighbours. The problem is further compounded by the fact that the target is an intelligent agent capable of minimizing its detectability by the adversary and thereby has the potential to increase the tracking error of the tracking agent. Gu et al. [87] solve this problem within the framework of a zero-sum game, and by minimizing the tracking agent’s estimation error, a robust minimax filter is developed. Moreover, to handle the limited communication capability of the sensor nodes, they propose a distributed version of this filter for which each node only requires information in the form of current measurement and estimated state from its immediate neighbours. They then demonstrate the performance of their algorithm on a simulated scenario with an intelligent target and show that while the standard Kalman filter errors diverge, the minimax filter which takes into account the adversary’s noise, can significantly outperform the Kalman filter.

Qilong et al. [88] similarly address the issue of tracking an intelligent target, but they model a scenario where the tracking players are also in pursuit, and the focus is on protecting the target. Additionally, the target can fire a defensive missile at the attacker/tracker. The attacker has a line of sight of both the target and the defensive missile. The target plans to allow the tracker to slowly close the distance between itself and the target, all the while manoeuvring to develop an understanding of how the attacker reacts. When the attacker is close to collision, the defensive missile is released. The target and the missile then communicate, use the knowledge of the attacker’s movement patterns, and adhere to an optimal linear guidance law to destroy the attacker. This was modelled as a zero-sum competitive game between the attacker, the target, and the defensive missile. However, the paper also focuses on the cooperative game played between the target and the defensive missile, which is a non-zero-sum game. For them, the payoff is calculated by minimised miss distance (which ideally equals zero—a collision with the attacker), as well as the control effort required to guide the defensive missile.

Faruqi [89] discusses the general problem of applying differential game theory to missile guidance. They state that missile trajectory follows Proportional Navigation (PN), a guidance law typically used by homing missiles. The performance of these systems is measured by a Linear System Quadratic Performance Index (LQPI). With respect to differential game theory, they model the missile guidance problem by representing the missile navigation and trajectory with a set of differential equations. The general form of this problem is