The congressional and European Parliament testimonies of Facebook’s CEO focused attention on Internet and ecommerce corporations whose business models rely on the collection and exploitation of big data, with personal data being a major component. Legislators and the public at large came to realize a) the leverage such companies now possess through the dominant positions of the free and frequently personalized services they offer in exchange for the data they collect, b) the risks associated with not properly safeguarding this data, which reinforced the lessons from the massive data breach Equifax had suffered a few months earlier, c) the legislators lack of understanding about how the data is collected and used by these companies and their partners, and d) how difficult it will be to regulate the collection, combination, AI-based exploitation, and use of this data in a way that is agreeable to both consumers and businesses. These issues are re-emerging as more connected vehicles are shipped and will become more critical as companies using autonomous vehicles for consumer transportation, logistics, or specialized services start to employ big data of unprecedented variety and detail in insights-enabled business models. It is therefore necessary to understand who the main generators and users of this data are, who owns each type of generated data, the risks that may arise from mishandling the collected data, and whether existing and proposed regulations relating to autonomous vehicles and more broadly next-generation mobility suffice or need to be augmented.
By 2025 100% of the new vehicles sold worldwide will be equipped with 5G connectivity. In fact, according to a recent report, by 2022 125M vehicles will have Internet connectivity and will be able to transmit data captured by the vehicle, and receive data from OEMs, various infrastructures, such as the transportation infrastructure, automotive suppliers, and digital services providers. While every autonomous vehicle with driving automation of L3 or higher is connected, not every connected vehicle will be autonomous. Connected vehicles will generate and utilize data transmitted to them. To appreciate the value of the data generated by connected vehicles, consider that Toyota recently invested an additional $1B in Grab in order to install sensors in Grab’s cars to collect driving data. Earlier Intel had paid $15B to acquire MobileEye whose ADAS sensors collect data from connected vehicles produced by 27 OEMs. But autonomous vehicles will produce and consume big data of unprecedented detail (impacting volume), variety, and velocity. In terms of volume, consider that when deployed each production vehicle will be able to produce 4 TB of data per day and Waymo thus far has ordered 82,000 vehicles for its fleets. The value of the autonomous vehicle data will be even higher than the data from simply connected vehicles because it will include high fidelity location-specific information, and detailed personal information about the vehicle’s passengers.
In a previous post of this series I had stated that in addition to technology, the successful deployment of autonomous vehicles will depend on a number of different but highly interdependent factors including instituting appropriate regulations. In the United States we are already seeing a variety of regulatory efforts (enacted and proposed) at the federal and state levels regarding the testing and deployment of autonomous vehicles. Similar efforts have started in other parts of the world. However, regulation that accounts for the specific characteristics of mobility data is still lagging despite the major ramifications from misusing such data.
Of the data involved in each of the six broad and distinct use cases for autonomous vehicles I had previously identified, of particular interest is the data utilized by two of these cases: ride-hailing and ridesharing (collectively referred to as ride services). The companies offering ride services using autonomous vehicles will not only generate big data, but they will also collect and combine data generated by the consumers using their services, and by their partners participating in the fleet-based on-demand shared mobility value chain. This data will be used for the vehicle’s autonomous navigation, cabin personalization, transportation planning, fleet optimization, and many other important applications. Over time we can even expect the monetization of these data collections under novel business models even in non-transportation-related uses.
Even though in 2014 twenty automakers signed the self-regulating Automotive Privacy Principles, government-mandated regulation of the data collected by the outward-facing, in-cabin, and V2X sensors of vehicles with L3 or higher driving automation will likely be necessary because the data contains important personal information that to date is not adequately protected by the established principles. For example, the data captured by the vehicle’s outward-facing sensors includes information such as license plate numbers, pedestrian faces and other personal characteristics. Because of the ways the data can be utilized in various applications of next-generation mobility, and the inferences that can be generated from this data by applying AI techniques, the risks to personal privacy, safety/security, and reputation are high. In fact in many instances they may be higher than those associated with the consumer data collected by Internet, ecommerce, and consumer credit reporting companies (see Equifax’s massive data breach).
In order to protect individual and corporate rights without stymying new business opportunities for next-generation mobility, it will be important to understand:
- How data from connected vehicles is currently being collected and exploited, as well as how these practices will change with the introduction of autonomous vehicles. It is important to understand the state-of-practice and state-of-the-art technologies for data collection, management, and exploitation. For example, blockchain may be an appropriate state-of-the-art technology for the management of the data collected from autonomous vehicles. Regulation based on misunderstood or little-understood technologies, models, etc., is extremely dangerous. Because of important similarities between the data and related technologies used in online advertising applications and the data in connected and autonomous vehicles, it may also be instructive to understand how companies that participate in the online advertising ecosystem, e.g., platforms like Facebook and Google, advertisers like Procter and Gamble and GM, data brokers like Experian, and application developers like Zynga exploit the data they collect in order to effectively market to consumers.
- The difference between a company selling the data it collects and partnering in order to provide access to that data. For example, an automotive OEM may not sell data it captures but may allow access of such data to its partners, for example, through the ODB port.
- How data-driven business models work today and how such models could transfer to products and services offered using autonomous vehicles. For example, Netflix analyzes consumer data not only to determine what to recommend to its subscribers but also to decide what original content to produce in order to increase its subscriber base and reduce attrition.
- Each regulation’s end-goal: the vehicle’s safe operation vs the consumer’s protection against various types of risk.
Data generators and data flows
In order to appreciate the risks to personal privacy, reputation, and safety/security it is important to recognize that companies involved in fleet-based on-demand mobility using autonomous vehicles generate and access large and diverse data sets, many involving various types of personal data. I foresee that there will be eight major categories of data generators in the value chain of on-demand ride services using autonomous vehicles:
- Automotive OEMs.
- Platform providers. These are the companies providing the AV Operating Platform or the UX Platform. They may be OEMs, Tier 1 suppliers, or startups.
- Fleet leasing companies. These are companies that order and finance the acquisition of vehicles from OEMs and then lease a fleet of vehicles to fleet operators. The companies offering ride services may not have the financial ability to purchase autonomous vehicles outright, so the role of these companies becomes important.
- Fleet operators. The companies offering ride services using such vehicles.
- Fleet managers/maintainers. These companies are responsible for maintaining an operator’s fleet on a daily basis (from refueling/recharging each vehicle, to cleaning it, and repairing it, as appropriate) in order to maximize its uptime and reduce the cost of service.
- Digital services providers. Companies that are providing entertainment, productivity, commerce, mapping, traffic, insurance, and other types of digital services.
- Local, state, federal, and national.
It is not only important to understand what data each generator is creating but also how this data is shared among the companies participating in the value chain. Figure 1 shows the expected data flows between data generators for the ride-hailing and ridesharing use cases. With the exception of consumers who only generate data, the other entities participating in this value chain have the opportunity to collect and exploit for their own benefit data that is generated by their partners. As we are also starting to see, in addition to automobile fleets, the fleet operator offering ride services may also own and/or have access to fleets of other vehicle types. For example, Uber now owns Jump that operates fleets of dockless bicycles, and recently invested in and is partnering with Lime, a company that operates escooter fleets. Didi and Lyft are following a similar path towards multimodal transportation.
A number of different models for operationalizing ride services using autonomous vehicles will emerge across the Fleet-Based On-Demand Shared Mobility Value Chain. Each model offers data generators the opportunity to combine the data they create with data they receive from their partners and analyze the resulting databases using AI. For example, Figure 2 depicts the model that will be employed by GM’s Cruise division to offer ride services using autonomous vehicles. In addition to manufacturing these vehicles and transferring them from the Chevrolet division to the Cruise division, GM will be a platform provider since they are developing both an AV Operating Platform and a UX Platform, fleet operator, and fleet manager. This means that GM will have direct access to consumer data, vehicle performance data, various types of trip-related data (including video from inside and outside the vehicle), individual vehicle and fleet wide maintenance records, and even digital services-related data (in addition to its OnStar service, GM will offer Passenger Commerce). Under the right partnership agreements, GM will also be able to access data from its government and digital services partners.
Using a different model (shown in blue in Figure 3 below) Waymo will be platform provider, since they are developing both an AV Operating Platform and a UX Platform, fleet operator of the 62,000 Pacifica minivans, and 20,000 i-Pace SUVs they have ordered directly from their OEMs, and digital services provider (YouTube, Waze), but not fleet manager (they are partnering with AutoNation and Avis). In addition to the data it generates Waymo will also be able to benefit from all the additional consumer data its parent Alphabet has been collecting through its other businesses. Finally, Waymo will be able to access the data generated by its OEM, fleet management, digital services, and government partners.
Uber will utilize a slightly different model from Waymo’s. Based on this model (shown in green in Figure 4), Uber will be an operator of multimodal transportation fleets that will include at least 24,000 Volvo SUVs they have ordered directly from the OEM, dockless bikes, and escooters, and potentially a platform provider (they are developing their own AV Operating Platform and UX Platform while also considering using Waymo’s AV Operating Platform). Similar to Waymo, Uber will augment the data generated by its ride services with the big data they have been collecting under the ride services they offer today. The company will also have access to data generated by its OEM, (platform provider?), fleet management, digital services, and government partners.
An example showcasing the data’s implications
To better understand the implications of collecting, combining, and analyzing ride services-related data, consider the following example. In exchange for an annual subscription to its multimodal transportation service, a fleet operator offers to consumers a daily transportation plan that uses its on-demand mobility services. The transportation plan is offered for free. The consumer is only charged for the utilized transportation according to a selected tier of service. During the sign up process the subscriber provides: a) personal data, including health-related information (used in order to determine, for example, whether bicycle or scooter transportation may be viable options for the consumer, or whether special vehicles will be needed as would be the case for a handicapped person), b) transportation preferences from those available in the selected tier of service, etc., and c) calendar access. The fleet operator also records a) video, audio, and passenger-specific biometric data from the vehicle’s cabin during every trip (most limo services and taxis in the US already collect video and audio data), b) all the data captured by the vehicle’s outward-facing sensors (data from and about other vehicles, the transportation infrastructure, pedestrians, cyclists, etc.), and c) the V2X communication data during every trip.
Using the provided and recorded data, data from every subscriber’s past interactions, and data provided by its partners, (expected weather conditions, historic and projected traffic conditions, public transportation loads at times of interest, etc.), the fleet operator creates the daily best end-to-end ground transportation plan. There will be several different ways of specifying what is best for a particular consumer. It may be the cheapest plan, the one that takes the consumer from one destination to the next in the fastest possible time, or the one that uses the fewest modalities (ride-hailing, ridesharing, walk, escooter, public bus, subway, etc.).
But in the process of creating value for the consumer by formulating the free daily transportation plans, the operator also learns:
- All the places the consumer visits each day and the order of the visits, including places visited routinely.
- The purpose of each visit (most of the times this can also be inferred if it is not described explicitly in the calendar).
- The transportation modalities selected by the consumer and how they may differ from the ones proposed by the plan. Depending on the consumer’s reputational score, the fleet operator may not make available certain transportation options which could have a negative financial impact on the consumer. For example, if riders who shared rides with the particular consumer have complained about the consumer, ridesharing options, that are cheaper, may not be offered to the consumer in the future.
- Details about each destination. For example, all the businesses operating in a particular location.
From this data, the fleet operator may also be able to make inferences that impact the subscriber’s privacy, and security. For example, the operator may infer that the subscriber has a medical problem because for a period of several days he did not select transportation options that utilize bikes and escooters, even though he has used such options in the past under fair weather conditions. Such an inference can be used in the daily transportation planning process but also by health insurance providers. It impacts the consumer’s privacy and even safety. The operator may also be able to infer the subscriber’s financial position from the transportation modalities he most frequently selects, the places he visits. This can impact the consumer’s privacy and reputation, as well as expose the consumer to fraud.
Therefore, regardless of the model employed by the companies offering ride services using autonomous vehicles the data and associated AI-based inferences can impact personal a) privacy because they provide detailed understanding about an individual beyond what may be necessary for offering the expected personalized transportation experience, b) reputation because the collected data may be incorrect or misused, and c) safety/security because the databases containing the detailed personal data may be breached exposing the subscriber to various types of fraud and even physical danger. As such, this data and associated inferences can be considered as the transportation incarnation of the cookie-based tracking that websites employ today.
Is existing data regulation adequate?
We must separate the regulation relating to the safe operation of autonomous vehicles from regulation relating to the data produced and consumed by connected or autonomous vehicles. For example, the AV START Act making its way through the US Senate is about the development, testing, and safe operation of autonomous vehicles, but doesn’t address data privacy and cybersecurity. Similarly, regulation governing the data captured by vehicle Event Data Recorders is inadequate for the data captured by autonomous vehicles. It is also important to understand that while vehicle safety is the responsibility of NHTSA, data privacy is the responsibility of the FTC. Therefore, in the US, the FTC and NHTSA must collaborate in order to address data-related issues for autonomous vehicles.
With regards to mobility data, it is important to determine whether:
- Existing data-related regulations suffice or will need to be extended to cover mobility under the business models used and envisioned;
- New regulation will need to be developed from scratch.
The European Union’s General Data Protection Regulation (GDPR) requires corporations to disclose to consumers what data they collect. It provides guidelines on what personal data must be anonymized, including license plates. Later this year or in 2019, the European Union’s ePrivacy Regulation will put in place more rigid requirements for individual consent for the sale and use of customer data in electronic direct marketing. These regulations can be extended to include transportation-related data, as well as face recognition and other personal characteristics that can be utilized by computer vision software to identify an individual.
California recently voted the California Consumer Privacy Act of 2018, a data privacy bill that will go into effect in 2020. When it does, it will allow consumers to opt out of data sharing and prohibit the sale of their data, including on-demand mobility data, to third parties. It is likely that this regulation will be used as a template by several other states that want to enact privacy protections. There are also several other pieces of legislation making their way through Congress including the CONSENT Act that will require consumers to opt-in to share information with technology companies that want to collect and use it, and the SPY Car Act of 2017 that deals with cybersecurity protection of vehicle data, and personal data about driver and passengers. Both of these are stalled and are not expected to be approved any time soon.
Companies in the fleet-based on-demand mobility value chain will need to state explicitly what data is necessary to collect for the business model being used to monetize the service offered. Then describe what data is actually collected and how it is collected. This means that if a company collects more data than is necessary for the accomplishment of a stated transportation goal, the user should know it and consent to it.
Quantifiable value exchange between consumer and service provider must drive the permissions. For example, the owner of a connected vehicle receives value through Over The Air (OTA) updates of a vehicle’s software because such updates are important to the vehicle’s safe operation or introduce new, value-enhancing features. Over time we may need to develop technology that establishes the value for each piece of generated data and for how long that value holds. The user of the data can then specify why the data may need to be kept for a particular length of time after it is captured.
Obviously, the data collector must make it easy for the user to opt-out from the harvesting of certain types of data after initially opting-in. For example, a user may initially decide to use a service that is monetized through online advertising. For this reason, the user gives permission (opts-in) to the company offering the service to collect personal data in exchange for this service. Later on, the user decides to switch and subscribe to the same service. This means that not only it should be easy to opt out from the collection of the data that was collected under the advertising-supported model but that the data that was previously collected must also be erased.
Ultimately, consumers will need to develop four types of trust with the companies that are part of the fleet-based on-demand shared mobility that use autonomous vehicles:
- Trust that the autonomous vehicle will operate correctly while transporting a consumer to the intended destination.
- Trust that only data necessary for providing the desired mobility services will be collected by the companies involved in the on-demand consumer mobility value chain.
- Trust that the collected data will be properly safeguarded.
- Trust that the data won’t be used in a way that is nefarious and harmful to the consumer.
In considering regulating the data associated with autonomous vehicles it is important not to repeat what we are now facing with Facebook and other internet companies. These companies created the technology, built the business model, and now through regulation are trying to clean up the mess associated with privacy, reputation, safety, and security. We can’t afford to follow the same sequence with autonomous vehicles.
The previous article in the series.
(Cross-posted @ Re-Imagining Corporate Innovation with a Silicon Valley Perspective)