Synthetic Populations

Motionworks begins with a large, deterministic, device-level dataset that they curate into anonymized products. Utilizing device data alone leads to unnecessary bias introduced in the products that can be systematically removed using a synthetic population.

A synthetic population is an anonymized, digital population that is statistically representative of real people. You can think of this as a digital twin of the world as it relates to human mobility. In the case of the Motionworks’ Synthetic Population, every individual in the country is accounted for.

Synthetic data is used for a variety of reasons. Utilizing synthetic data is an essential approach to population analytics, as it simultaneously preserves both the privacy and anonymity of any one individual while providing broad utility. Many large enterprises like Amazon, Google, and American Express use approaches like this to improve their business outcomes because of the many benefits described above.

Gartner predicted in the Wall Street Journal that “By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.” We share this outlook for; as computing becomes more accessible, why take a direct, less privacy-friendly approach when a data science synthetic approach can offer even more utility?

It is important to note, as with most AI models, Motionworks begins with a large, deterministic, device-level dataset that we curate into our anonymized products. But, utilizing device data in isolation leads to unnecessary bias introduced in the products that can be systematically removed using a synthetic population.

Summarizing: A Synthetic Twin’s Reasons for Use

  1. Privacy – Motionworks is committed to providing insights in an anonymized, aggregated, and privacy-compliant way across all of our solutions. Motionworks data does not provide insights into individual behaviors and cannot be reverse-engineered or manipulated to provide such information.
    Utilizing synthetic populations to understand population behavior is recognized as a process that preserves the confidentiality of individuals while providing realistic, representative data outputs.
  2. Total Population Metrics – The Motionworks approach is developed to measure everyone, everywhere, all the time, rather than measuring device activity within a target polygon. There is no source of deterministic data available today that can provide persistent, accurate information on the location of every individual in the country. Only through the use of a synthetic population derived from a large deterministic data sample is total population measurement achievable.
  3. Granularity – The Motionworks data set produced from this synthetic population reflects where everyone in the country is at any given moment in aggregation, allowing the data to be cut in a variety of ways and geographic resolutions while maintaining integrity.

Methodology

As discussed earlier, a digital population, also called a "synthetic population,” is a virtual representation of an actual population that can be used to analyze the behaviors of groups of people. You will not find an exact virtual copy of yourself or anyone else within a digital population. Instead, a digital twin of the population is created which is statistically representative of you and everyone else in the real population. Because of this, using digital populations has the added benefit of protecting individuals’ privacy.

To develop this digital population, Motionworks identifies the most important demographic variables on which to base its digital population and how people in each of those demographics are distributed in each neighborhood (Census block group in the United States). These demographics are called “controlled variables.” Motionworks obtains anonymized household and person records that include both the identified controlled variables as well as other demographic variables (uncontrolled variables). Analyzing these datasets, Motionworks builds correlations between all of the variables, controlled and uncontrolled, across the population.

With the demographic correlations understood, Motionworks employs an entropy maximization algorithm to generate the digital population. Here is a primer on informational entropy for those that are curious and want to dig in deeper.

Building a Digital Population

Relative entropy measures the overlap or similarity between a model system and the target or reference system - in this case, comparing the Motionworks Popcast™ Digital Pop with the actual population. The closer the relative entropy is to zero, the more the two populations are in alignment.

Motionworks measures the relative entropy in each neighborhood and adjusts the model until the relative entropy falls within accepted thresholds.

Validation

Detailed validation reports are available for every release of the Digital Populations from the Motionworks Popcast™ Digital Pop product pages.