The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology

Abstract

The ongoing coronavirus disease 2019 (COVID-19) pandemic has heightened discussion of the use of mobile phone data in outbreak response. Mobile phone data have been proposed to monitor effectiveness of non-pharmaceutical interventions, to assess potential drivers of spatiotemporal spread, and to support contact tracing efforts. While these data may be an important part of COVID-19 response, their use must be considered alongside a careful understanding of the behaviors and populations they capture. Here, we review the different applications for mobile phone data in guiding and evaluating COVID-19 response, the relevance of these applications for infectious disease transmission and control, and potential sources and implications of selection bias in mobile phone data. We also discuss best practices and potential pitfalls for directly integrating the collection, analysis, and interpretation of these data into public health decision making.

Introduction

The coronavirus disease 2019 (COVID-19) pandemic caused by the novel coronavirus (SARS-CoV-2) has created an unprecedented challenge for governments, public health agencies, medical officials, and populations globally1,2. The public health response is seeking to effectively mitigate and contain the pandemic while balancing social and economic costs3,4,5. Control strategies thus far have primarily consisted of non-pharmaceutical interventions (NPIs), which have slowed down the epidemic in many settings. Most NPIs rely on reducing contact between infected and susceptible individuals through mass social distancing, including restrictions on social gatherings, closures of schools and businesses, shelter-in-place or stay-at-home orders or lockdowns, travel restrictions, active monitoring, and increased testing, contact tracing, and isolation measures6,7,8,9. These interventions are effective when they result in large-scale human behavioral changes that reduce the close contacts and mobility patterns that facilitate disease transmission, but are challenging to maintain10. Quantifying these patterns to assess NPI effectiveness, particularly on the spatial, temporal, and population scales necessary to fully inform public health response, is an important challenge for this pandemic response.

As a result of the rapid spread and grievous toll exacted by the COVID-19 pandemic, there has been increasing interest in developing innovative methods and tools to inform public health response through digital data, including mobile phone data both passively collected by mobile phone operators and actively collected via recently developed applications11. Mobile phone data remain one of the best sources of information on large-scale population behaviors12. These data can be collected in high- and low-income settings and can capture, in near real-time, changes in mobility and clustering patterns for large swaths of the population. We and others have previously used aggregated and anonymized geolocation information from passively collected mobile phone data to successfully inform and model the spatial and temporal dynamics of endemic and emerging infectious diseases, including malaria13,14,15,16, cholera17, measles18,19,20,21,22,23, dengue24,25, and Ebola26,27. Through these prior applications, an understanding of privacy-conscious ways to utilize these data and inform public health policy while forming productive collaborations with operators, public health officials, and academic partners has been developed.

Mobility analysis, quantifying clustering of social contacts, symptom tracking, surveying, and contact tracing applications have all been proposed and employed to some degree to inform the response to COVID-19 (see Fig. 1a). These applications, metrics developed to analyze these data, and proposed best practices have recently been reviewed by an interdisciplinary team of experts28. To build on this work, we examine the applicability of mobile phone data for public health response by reviewing the common applications of mobile phone data relevant to outbreak response; the kinds of behaviors captured within these data and proposed applications; the validity of these data for public health response and epidemiologic research, including sources and implications of selection bias; and potential concerns and best practices for direct integration of these data with public health response.

figure1

a Over the course of the epidemic, mobile phone data and applications may be relevant to help answer a number of important epidemiological questions needed to guide the implementation and evaluation of various interventions. b However, these data should be considered in light of ownership and use biases that may or may not limit generalizability to the overall population. Mobile phone owners and users only represent a subset of the population and may have additional age (shown here for a synthetic population for illustrative purposes), socio-demographic, or geographic biases. Applications that require the use of a smartphone or application may further limit the generalizability of these data since they represent smaller subsets of the user population.

Utilizing mobile phone data to inform COVID-19 response

Mobile phone data can be used to inform different aspects of COVID-19 response (Table 1). At the population level, quantifying changes in human mobility or clustering can help evaluate the impact of an NPI and identify hotspots where additional or different interventions may need to be applied. At the individual level, mobile phone data may be used to understand patterns of individual contacts and enhance contact tracing.

Evaluating current interventions and monitoring their release

The most widely used application of mobile phone data in public health to date is the use of telecom geolocation data to track population movements11,12. Mobile phone operators routinely collect Call Detail Records (CDRs) that contain a timestamp and GPS location with a unique identifier for all subscribers. These data thus are typically readily available and offer high coverage to estimate mobility patterns of individuals using their mobile devices. We note that similar time-resolved GPS location data may be passively collected through certain applications, though typically for only a subset of subscribers that may introduce further bias.

CDRs can be used to generate a number of metrics for characterizing large, population-level mobility patterns. Origin-Destination (OD) matrices reflect the number of times a trip is made between two locations (of varying spatial resolution) in a certain period. These matrices can be analyzed over time to detect temporal trends (i.e., holidays, seasonality, weekday vs weekend) and regular hotspots of attraction. These spatial and temporal flows of individuals between locations, including the magnitude and frequency of these movements, can be used to understand the risk of importation from areas with ongoing outbreaks to areas without sustained transmission where there is a risk of reintroduction and resurgence. Aggregate flows can also be used to retrace the likely introduction and spread of an outbreak in new areas and to inform future projections of disease risk or burden across space and decision making around the design and implementation of travel restrictions or increased surveillance.

Aggregate mobility patterns may also be critical pieces of evidence when evaluating the effectiveness of various NPIs. Most NPIs are reliant on modifying physical behavior. Monitoring the volume, frequency, and average distance of flow during interventions can be used to directly quantify the adoption and effect of these interventions, and identify areas of high potential risk to target with different interventions. There are already identified associations between reductions in population-level mobility within and between different locations and COVID-19 incidence6,10,29, though further exploration of which population-level metrics are most closely related to changes in disease risk and whether these associations are sustained throughout an outbreak is needed30. These associations would ideally be interrogated to identify individual behaviors associated with mobility measures that are also associated with individual risk of COVID-19.

The effect on NPIs can also be monitored through subscriber density metrics that combine the recorded GPS location and timestamp of CDRs to capture the real-time population density and identify potential hotspots. When using finer-scale GPS location data, these density metrics may quantify the likelihood or frequency that users came into proximal contact. A third metric derived from CDR or GPS location data, the radius of gyration, quantifies the range over which a single person may travel in a specified time period. Importantly, the data required for these applications are non-identifiable; they cannot be used to identify any given individual’s interactions, but provide population-level insight into the average clustering and movement of individuals. These metrics, along with traditional OD matrix flows, were recently employed in Italy as a way to evaluate the impact of its national lockdown31. Traffic flow between provinces and probability of colocation were reduced initially in the northern provinces, where the COVID-19 outbreak was first observed, a clear signal of reactive social distancing. As the epidemic progressed, and especially once the national lockdown was enforced, the entire country saw a reduction in traffic between provinces; however, the probability of colocation remained highly dependent on province and was likely attributed to the number of cases reported in each province. Interestingly, the average distance traveled by individuals was significantly reduced across all provinces after the initial outbreak was confirmed.

The use of Bluetooth data (records of proximal interactions between Bluetooth-enabled devices) to quantify physical clustering or real-time density of subscribers at small spatial scales (e.g., zip codes) and fine temporal resolution has been explored for the purposes of contact tracing (see below). The use of these data has been considered less for population-level analyses, though it offers another source of information on behavioral changes under different NPIs. When activated, mobile phones will emit a Bluetooth beacon that is detected by other activated phones. When two Bluetooth-enabled devices are within range, the date, time, distance and duration of interaction can be recorded. The frequency or number of these interactions (analyzed anonymously to form, broadly, measures of clustering or proximal interaction rates over time) may be important given the role of sustained interaction or overcrowding of individuals32,33,34 and contact structure in SARS-CoV-2 transmission35. Furthermore, Bluetooth data in combination with GPS data or a network of Bluetooth sensors can be used to quantify the amount of time people spend at home or other identified locations when lockdown measures are in place to determine if policies are effective.

These data and measures of population-level mobility or clustering patterns would be exceedingly difficult to collect on a similar scale without mobile phone data. These data are often continuously collected, in near real-time, allowing for continued analysis as an outbreak unfolds. Importantly, though, a baseline understanding of contact or clustering patterns prior to any interventions is necessary to inform estimates of intervention impact.

Facilitating contact tracing

Opt-in applications (apps)36,37,38,39,40,41,42 that rely on digital approaches to enumerate and contact individuals who may have been in proximity with someone infected with COVID-19 have been proposed to increase efficiency and decrease the very large burden of manual contact tracing programs43,44,45. By enabling rapid tracing of perhaps higher proportions of affected individuals, these apps can reduce the amount of time that a potentially infected person would have to infect others, particularly in asymptomatic or pre-symptomatic phases of infection46. Most contact tracing apps collect Bluetooth and/or GPS location data to create trails of contacts over a moving time window (14-28 days). Unlike the data needed to understand population-level, aggregated behaviors described above, these data must be linked to single individuals and capture pairwise interactions with other identifiable individuals. Once a case has been identified, they are added to a list of infected users that is queried by the other phones in the network. If the infected user is detected in the trail of contacts, then the user and their contacts are alerted, either by the app or by a public health official, to initiate isolation and quarantine.

This contact tracing process occurs either in a centralized manner, where user information is sent to a remote computer where matching occurs, or in a decentralized manner, where the matching process occurs on the user’s phone. In order for these approaches to feed directly into public health decision making, a direct line between the developers, public health response teams, and users needs to be put in place. This will also be key to mitigating any privacy concerns, which should be dealt with in a transparent and direct manner. Although there has been little discussion to date, routinely collected, individually-identifiable Bluetooth or fine-scale GPS location data may also be used to infer and quantify high-resolution proximity network structures which may further inform contact tracing efforts, but will also raise additional privacy concerns47,48.

Frameworks to process and analyze mobile phone data

Luckily, computing resources and methods to analyze and extract these data will not likely be the limiting factor in these instances. Groups such as Flowminder and Telenor Research Group have worked for multiple years to develop more streamlined processes to analyze these data, particularly aggregate mobility data, that are able to directly interface with mobile phone operators. Flowminder has produced a suite of CDR aggregates, such as counts of active subscribers per region or counts of travelers, that can then be used to calculate indicators of mobility, such as crowdedness, population mixing, locations of interest, and intra-/inter-regional travel49. The code to extract these metrics is publicly available at50. Telenor Research Group works directly with mobile phone operators to provide researchers with spatially aggregated CDR/mobility data51. Facebook’s Data For Good program provides aggregated mobility data to researchers that come from their subscribers, and companies like Cuebiq provided mobility data for a number of COVID-19 studies that summarize the distance users travel or the proportion of users that stay at home52. These existing frameworks – not only the analyses, but also the privacy considerations and data sharing agreements – will provide standardized methods that facilitate integrating mobility data into intervention assessments.

Data privacy

Various forms of identifiable personal information are generated when using mobile phones, including names, identification numbers, fine spatial and temporal data on where the device was used, other users’ identification numbers who may have been detected by Bluetooth, and personal details that might be entered into an app. In light of the growing number of digital privacy concerns and regulations, one must carefully consider the exact form and use of mobile phone data being collected against the legal and ethical need to protect users’ data security and confidentiality. While maintaining user confidentiality is often seen as a hindrance to the use of mobile phone data, in that it limits the use of individual-level data and typically requires aggregation to coarse spatial and temporal resolutions, there are a number of existing frameworks that can help provide guidance for the effective, privacy-conscious use of mobile phone data53.

Exactly which model of data privacy will best suit the use of mobile phone data for COVID-19 response will depend on the exact form and proposed use of the data. As discussed above, there already exist many data processing and analysis frameworks to provide anonymized indicators of population mobility. These standard procedures, though, could result in aggregated data with insufficient spatial and temporal resolution to be effective for monitoring the spread of SARS-CoV-2. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR)54, offer exceptions for the use of non-anonymous data that may be needed for other response efforts. For example, opt-in applications for contact tracing may seek consent of the data subject to collect and analyze identifiable data, though the ability to scale opt-in approaches to a wide enough population and to maintain user compliance and participation remains unclear. GDPR and other regulations also provide an exception for anonymization of data to be used in public service, but the regulatory hurdles to gain this exception can be substantial and would require clear use policies and applications for these data. The use of mobile phone data, particularly forms such as those proposed through contact tracing applications, must be weighed against the possible infringements of privacy and civil liberties versus the potential public health benefit.

Capturing epidemiologically-relevant behaviors with mobile phone data

Both the ability to capture behavioral patterns in a large proportion of the population and the potential scalability of these approaches are some of the most promising aspects of mobile phone data11,12. The potential for broad expansion in the collection and availability of mobile phone data requires an understanding of exactly which behaviors are captured and whether these behaviors are valid measurements of interactions relevant for infectious disease transmission (Table 2). The validity of these behavioral metrics needs to be evaluated in the specific context of their application, including the spatial and temporal scales of the data and the proposed public health actions or policies informed by these data.

Many natural experiments are now occurring as various NPIs are implemented and lifted, which could be systematically and passively measured via mobile phone data to guide decision makers monitoring the effectiveness and implementation of various NPIs in real time. CDR data may offer one of the best assessments of changes to population-level mobility and clustering behaviors in response to NPIs at potentially fine spatial and temporal scales. These data, though, are only relevant to disease transmission if we can assume that these aggregate behaviors capture the movement of infected, and potentially infectious, individuals. Individual behavior may change in response to real or perceived illness in ways that are not easily captured in aggregate metrics. These aggregate measures are also unable to distinguish movement with high risk of transmission (e.g., shared public transit without appropriate protective equipment) from movement with low risk (e.g., travel by personal vehicle with appropriate social distancing), and therefore will not fully capture the spectrum of behavioral changes that may reduce disease risk.

All forms of mobile phone-based tracking are only able to capture proxies of movement, in that they track a device rather than an individual. Compared to other measures of human mobility (surveys, direct observation), mobile phone data tend to more completely capture the movement of individuals within the study population. However, differences in how individuals use their phone may introduce important biases, particularly when attempting to assess changes in behavior across time or across populations. For example, mobile phone data may be unable to capture an increasing proportion of individuals staying at home or at work following restrictions on non-essential travel, where they may be more likely to use Internet-based communication that does not generate CDRs. Similarly, there may be differences in how individuals of different ages or in different regions use their phones, which will affect the validity of mobile phone metrics in these populations. Mobile phone data cannot distinguish between multiple people using a single phone or SIM card (either of which may be used as a unique identifier), nor does it account for users with multiple phones or SIM cards, limiting the ability to make any inferences about the behavior of individuals from CDR data.

The spatial and temporal scales over which these aggregate data are collected also have important implications for their application. Mobility flows derived from CDRs are commonly used in metapopulation transmission models to parameterize the rates at which individuals move between various locations. This application requires that the origin and destination locations be spatial areas within which the exact social contact patterns of individuals can be estimated (e.g., through mass action assumptions or age-specific contact matrices) and which directly relate to relevant public health decisions (e.g., administrative units with a common public health authority). Data availability and privacy regulations often limit the spatial scales on which these locations can be defined, nor is it always clear which locations are most relevant to disease transmission processes. While mobility flows are useful for understanding potential transmission links between these locations, when they can be defined, the spatial aggregation naturally limits their utility in understanding or modeling transmission chains within these locations. Respiratory viruses like SARS-CoV-2 are diseases of close contact; the spatial scale (several meters) over which transmission occurs is many times smaller than what can be explored through aggregate mobility flows (typically aggregated to areas of at least 500 m2).

The temporal scale of aggregation is also important. For reasons of privacy and computational efficiency, several hour time steps or daily movements are often calculated, and multi-step journeys over several days are not measured. Similar privacy and efficiency concerns mean that individual trajectories (e.g., moving from A to B to C) are often impossible to measure. As such, aggregate data are used and, though the relative connections between places are typically robust (e.g., flows between A and B), the exact magnitude of travel occurring multiple times per time step or along specific routes (e.g., transit in and out of a capital city) are difficult to capture in aggregate data.

Contact tracing applications require the use of identifiable, fine-scale clustering and contact data to understand the proximal interactions individuals have that may result in disease transmission. The tolerance for missing data in these applications is low; any missed interaction might be a missed transmission event. Unlike aggregated metrics, contact tracing applications allow for measured behavior to be linked to an individual’s infection status, though it is still unclear how to translate the physical proximity of mobile devices to transmission-relevant interactions between individuals. Particularly in dense areas (e.g., apartment buildings), these applications may capture many proximal interactions which have very low risk of transmission, leading to high rates of unnecessary quarantine and associated social and economic costs. It further remains unclear whether SARS-CoV-2 transmission through non-proximal interactions (aerosols, fomites, fecal-oral) plays an important role, compounding the difficulty of defining transmission-relevant interactions that can be captured in mobile phone data.

[“source=nature”]