Data Inputs and Cleansing

No single data source is enough to understand the population and its movements fully, so Motionworks relies on numerous data inputs combined, compared, and contrasted to produce the most robust understanding of population activity.

Aggregated SDK Data

The primary source of mobile location data Motionworks acquires is provided by multiple Software Development Kit (SDK) aggregators. These aggregators rely on location information collected apps installed on mobile devices. Motionworks has a continuous stream of data coming from multiple SDK aggregators. At the same time, Motionwork is constantly evaluating existing partners and vetting new ones. The app ecosystem, which provides the bulk of this data, is constantly evolving, and Motionworks ensures they are curating the most diverse, cleanest, robust combined data streams available.

First-Party SDK Data

Motionworks has unique partnerships with mobile location providers sourcing robust data from first-party apps. These data streams also provide access to verified visits and transactional check-ins. These resources are extremely helpful in understanding visitation to smaller venues that are nested within larger locations – a coffee shop inside of a mall, for example.

Connected Car Data

Motionworks also obtains a significant amount of connected car data through our partnerships. This data provides detailed information on the vehicles, route choices people are making, and details on parking locations. While en route to their destination, vehicles provide location data every few seconds.

Other Data Inputs

Motionworks enriches, validates, and calibrates its solutions utilizing other datasets, including national surveys, traffic counts, public transport ridership, airport departures, and arrivals, port of entry reports, as well as Motionworks customers’ first-party data that provide people counts, transaction counts, and check-ins at thousands of locations nationwide.

Data Cleansing

Location data streams inherently contain significant noise and various quality of locations. The data that enters the Motionworks data pipeline includes information on hundreds of millions of devices, with billions of locations in an average week. That magnitude of information requires a significant amount of processing to eliminate bad data and identify as much good data as possible. For this process, Motionworks has developed proprietary machine-learning technology to automate the process.

For example, suspected device duplication is identified during this process. Any suspected device duplicates and data associated from those devices is removed from consideration and would no longer be a factor in downstream analysis of the device panel.

Following this process, Motionworks has a clean, de-duplicated dataset with the highest quality locations identified and cataloged along with quality statistics to inform the curation of a selected panel of devices

Device matching

The location data received is tied to a Motionworks identifier. Since Motionworks sources data from multiple providers, a single device may show up with a series of pings from one aggregator and additional new pings from another provider – creating an even more holistic view of the locations that the device visited over a period of time. Motionworks' machine learning algorithms de-duplicate devices, unifying the datasets and tying all the information to individual devices.

Noise reduction

Inherent in every mobile location data set is a lot of noise – including incorrect or impossible location data. After the data feeds have been normalized, Motionworks’ machine learning algorithms evaluate the location data across all devices - as well as through time for specific devices to provide context and further score the quality of the locations and eliminate noise from the dataset.