Data privacy has often been an after-thought in software and platform development. Data breaches have increased consumer awareness and laws such as GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) have been enacted. Programmers and engineers need to think about what data they are collecting, how it is being stored and accessed, and how it is shared to protect their end users.
Today’s guest is Nishant Bhajaria. Nishant leads the technical privacy and strategy teams for Uber that include data scientists, engineers, privacy experts, and others as they seek to improve data privacy for the consumers and the company. Previously, he worked in compliance, data protection and privacy at Google. He was also the head of privacy engineering at Netflix. He is a well-known expert in the field of data privacy, has developed numerous courses on the topic, and has spoken extensively at conferences and on podcasts.
“Most companies tend to understand data at the back end of the process when data is at risk. The problem is, at that point, the volume of data has grown as has the usage which means your ability to manage the outcomes is much more… Share on XShow Notes:
- [1:09] – Nishant shares his background and how he got started in the field of data privacy. He started at Intel and explains the changes in data collection in the early 2010s.
- [2:47] – Nishant started in the field “before it was cool,” because of his strengths as an engineer and writer.
- [3:33] – GDPR stands for General Data Protection Regulation and Nishant describes what this law means and how it came about in Europe.
- [4:47] – CCPA is the US’s approach and first step into data privacy laws.
- [5:53] – Consent is going to be a big topic in 2021. Nishant describes how the events of the last decade have led to data privacy laws.
- [6:56] – Nishant points out that a problem with data privacy laws as they stand right now is that they are not written by people who have the technological and engineering background.
- [8:39] – The data privacy issues that have arisen in recent years did not happen all of a sudden. Nishant explains that many mistakes across the board have led to them.
- [9:00] – Nishant lists some of the conundrums and ethical questions that come up when discussing data privacy.
- [10:23] – One of the biggest problems with data privacy is the different understanding of what that means. European countries and the United States do not have the same understanding of what privacy is.
- [11:46] – Security features exist for very good reasons, but people are generally very impatient with them.
- [12:12] – Nishant gives an example of microdecisions that come in to play when data gets into the hands of the wrong person.
- [14:17] – Nishant gives an example of how some decisions, made by companies in response to GDPR, are making sure they are in compliance but are not always consumer friendly due to a lack of understanding of the law.
- [15:56] – The internet was not designed with privacy in mind. Privacy was an afterthought.
- [17:06] – Nishant describes the challenges that we face when consumers want to access apps and sites quickly and the domino effect that takes place.
- [18:29] – There was a huge systemic change in the workforce in the field of data privacy and data collection that Nishant describes was due to most people joining this career after 2009.
- [19:43] – A problem arose when engineers would think that they were always the ethical ones because they were collecting data or designing apps and platforms to collect data for the right reasons. But that isn’t always how that data is used. More to Nishant’s point that data collection needs to be regulated from the get go.
- [21:03] – Privacy is all about not accessing or using data without the owner’s consent, but people don’t realize how much can be known about someone just with combining easily accessible data online.
- [22:10] – We have built the internet for fast access and use. Customers sign up for a lot of access to sites and apps and don’t think about the use of their data when they do.
- [24:31] – For companies that are small and don’t have the legal teams to handle a privacy problem, Nishant says the first thing to do is to make sure you really need the information you are asking for from your customers.
- [25:27] – It is much easier to look at what you’re collecting, the necessity of it all, and how that data could be compromised in the early stages because there’s not a lot of data to dig into.
- [26:06] – Another tip from Nishant is to lean on tooling to build privacy at scale. He describes what this means with examples.
- [27:36] – Nishant also explains to make sure that the wall between the legal team and the engineering/privacy team is broken down. Those teams need to work in harmony.
- [29:10] – Chris and Nishant discuss the pitfalls of deleting data and the importance of consistency.
- [31:07] – Many companies cannot afford to go through a data breach or legal problem with data privacy.
- [32:10] – There is an economical factor to consider when collecting too much data or duplicate data that Nishant describes.
- [34:18] – When signing up for services, sites, or apps, consider why they are asking for the data they say they need. A social security number, for example, is not needed for a grocery delivery.
- [36:01] – As a result of the GDPR, companies are starting to be required to disclose the information of what consumers’ data is used for.
- [36:28] – Nishant says that the biggest piece of advice he has for consumers is to always ask questions. At the end of the day, it is your data and you need to know what’s happening with it.
- [37:56] – Apple specifically has built a really strong privacy standard for other companies to live up to.
- [40:01] – This time of Covid and the US’s political events have changed the landscape of privacy and data collection and through this crisis, Nishant is confident that great ideas and positive change come through times of unrest.
- [41:37] – Regulators and lawmakers need the engineering support and need to be a part of our conversations regarding data privacy.
- [43:24] – Nishant hasn’t met anyone that has thought that privacy is unimportant, but communicating the details and the prioritization is a different challenge.
- [45:16] – Privacy by Design is Nishant’s book written to educate business owners, engineers, and CEOs that privacy is taken care of at the start instead of as an afterthought as a response to a problem.
- [47:31] – Regardless of your current understanding of technology, Nishant’s book is a great read to better understand privacy and data collection.
Thanks for joining us on Easy Prey. Be sure to subscribe to our podcast on iTunes and leave a nice review.
Links and Resources:
- Podcast Web Page
- Facebook Page
- whatismyipaddress.com
- Easy Prey on Instagram
- Easy Prey on Twitter
- Easy Prey on LinkedIn
- Easy Prey on YouTube
- Easy Prey on Pinterest
- Nishant Bhajaria on LinkedIn
- Privacy by Design by Nishant Bhajaria
- Nishant Bhajaria on Twitter
Transcript:
Nishant: I've been in this domain for a while now. I started off my career as an Engineer at Intel back in the day. I wrote code at WebMD. As the beginning of 2010 really took hold, you had this move towards data-driven innovation where companies connected with the customers much more directly. Their data collection from the customers enables companies to understand the customers better. We were in this sweet spot where Google and Facebook it meant more data was gushing in.
Eventually, as that moved forward, we were in this moment of reconciliation where there needed a balance between how much data are we collecting and how do you make sure that the customers are getting something for it. How do you make sure that the companies collecting the data are being careful custodians of customer data? It was during this time that I was also working on the identity space, that is how do you make it easier for customers to log into your platform, that opened the door in the early 2010s to get into privacy. I got to work with some really smart folks during my Nike days in Oregon to have a company of that scale, to have a company with that much of a brand reputation at stake, also in the global pantheon in terms of selling merchandise to folks. It's a very customer-driven retail PR company all at once. That's how I got into privacy.
I had the product background, the engineering background. I had just my debating background, my ability to write stuff meant for a discipline that is, by its inherent nature, so cross-functional, so persuasive, so given to technology, and a hybrid role, if you will, that's kind of how I got into privacy. There was no one role that I applied for. There wasn't a LinkedIn post or a Monster post. It was a series of conversations, a series of micro decisions. Then of course Privacy became this cool thing to be in. I was in privacy before it was cool, and then it became cool. That's kind of the TL;DR.
Chris: There's definitely a lot going on in privacy, particularly, in the last couple of years with the GDPR coming to fruition. A lot of us thought, is it really ever going to actually happen. There's this objective, but is it actually going to end up happening? Actually, I will give you an opportunity, can you explain to the audience what GDPR is and CCPA because those are kind of the two big ones to me?
Nishant: Correct, absolutely. Before I answer, I will stipulate that I'm not an attorney. Nothing I say here should be construed as a legal opinion. Attorneys don't like it when engineers offer legal advice any more than engineers like it when attorneys offer technical guidance. It's helpful to have that little demarcation. But the two laws that you mentioned are very germane because all of us have gotten emails. My mom kept asking me why she keeps getting emails about GDPR.
As I mentioned before, there has been this gradual ramp towards this moment where, how do you ensure that customers have rights—rights to their data, rights to what gets collected, how it gets used, how often is it collected, how long it's kept for, who
would share it with, who accesses it? There's a lot of granular actions that happen when it comes to data collection. The GDPR was essentially Europe's contribution to that. Basically saying that let's give customers some very clear enumerated rights so that they can access their data, and they can have a level of a level playing field with companies that do the data collection.
Phrases like, right to be forgotten, often come to mind. Where customers are given a right to say, okay company, you collected all this data from me, give me the data you have, or delete all my data, forget me essentially. Of course, there's a caveat to all of these. Nobody can be absolutely forgotten. There are reasons why you might want to keep that data in case they come back as a customer, in case there's a lawsuit. Things like that. But for the first time you had customers get very direct rights vis-a-vis companies in terms of data collection, CCPA is essentially the California response to that. I like to say the US response because unlike GDPR, the US does not have a federal national privacy law. That might change.
As of this recording, we are literally two days away from a new administration, a new DOJ taking office in DC. That might change by the time the audience hears this. Although I'm not sure, the new privacy law is going to be foremost in the minds of President Biden when he comes in. But CCPA, essentially, is significant because California is one of the world's largest economies, depending on what you count. It's either the fifth, the sixth, or the seventh-largest economy in the world. It sets the tone for what counts as privacy law.
In terms of what companies are supposed to do with privacy before the product ships, privacy impact assessment, which basically is a fancy way of saying that before you ship upon a cloud, a company has to do some sort of checks to make sure that they won't be privacy harms. How do you make sure the data is deleted in time? How do you make sure that customers have consented? Customers have to consent before you collect their data, or track them across different websites. In fact, it's important that you mentioned CCPA because the phrase consent is going to be a big Topic in 2021. Apple made a request—not a request—but essentially mandated that app developers collect consent much more explicitly from users before their data is collected. You can draw a straight line between decisions we made back in the early 2010s in terms of data collection with significant growth companies had in terms of understanding the customers and being able to monetize their data.
Some of the problems this industry has had are in terms of companies and how they have misused data and made mistakes, governments misuse of data, GDPR, CCPA, and Apple move on the consent side, and what’s Google going to do with the Chrome Privacy Sandbox—all of these are fairly interdependent events over the span of a decade.
Chris: They're all hopefully leading to the same, consistent, common goal. Although they're not coordinated and that sounds.
Nishant: Correct, you made a great point. A common goal and not coordinated. They have a common goal in terms of protecting customers and ensuring that companies are diligent, ethical players with customer data. The challenge is that all of these laws and all of these regulations are often written by folks that may not have domain expertise on the technology side. They may not be data scientists. They may not know the difference between protecting data in an unstructured database, like Cassandra versus what happens at the backend of Hadoop, for example. They may not understand that deleting data often hurts the customers more than a thousand because for example, if a customer gets upset at you as a company and deletes their data, but then they come back two days later because they've cooled down a bit, they want to be able to access their previous shopping history or access for previously saved wishlist. There are context gaps here.
One of the opportunities of the moment and the perils of the moment is that, if a company starts doing the right thing from a privacy perspective from the get-go, at the same time, if governments write these laws in a way that represents some understanding of the technical choices at hand, we could collectively do a much better job of protecting the customers. That is where the uncoordinated nature is challenging because things are moving pretty quickly. As an industry, as a country, as a society, we're making up for the lost time and sometimes it leads to optimal outcomes.
Chris: One of the funny anecdotes I've thought about with respect to the right to be forgotten is if someone submits a request, I want you to delete all my data. Once you've deleted all the data, do you have an obligation to delete the request and the history of the request? Because you're still retaining customer data if you retain the request.
Nishant: Correct. It’s almost like the classic chicken and egg conundrum. Everything we needed to know we learned in fifth grade. This is that fifth-grade question coming back right here. The laws tend not to be very prescriptive. They tend not to be very, very granular. I give the regulators credit for that. They are trying to figure this out.
When people on the tech side in the Silicon Valley circles complain about regulation, I tell them, we didn't get here all of a sudden. We got here because of several mistakes that people made across the board. This is true on all sides. I feel like it's going to take some error to processes to get this right.
It's easy to complain about some of the stuff. But the point you raised is a good one, which is what does retention look like? It's not just retention. That is, what do we retain? How long do we retain it? How do we tie it to a specific use case?
I'm not opposed to people collecting and retaining data. It's more about what's going to happen to it. Are you retaining data because you think someday in the future you might find a good use for it and therefore, let's keep it? I've got a problem with that because you've created a security attack vector and also you haven't given the customer a choice. At the same time, if you deleted data too quickly to demonstrate that you are an ethical player, you've also created an incentive for somebody to do stuff with their data very quickly and then deleted it quickly, as well. There are trade-offs here.
I think this is where having some of these controls built only in the pipeline—I've talked about it in the book we’re going to discuss—is critical simply because that means you've built some sort of a fail-safe even if the law is not very clear.
Chris: We're talking about how the evolution of the laws coming to fruition. Were there in your mind certain triggers that ultimately caused the legislation? Were there particular data breaches? Were there particular behaviors associated with data like Facebook and Cambridge Analytica? Were there other kinds of industry things that you see as well? These are definitely catalysts that made GDPR and CCPA kind of brought it to the front of mind and caused the legislators to, hey, we need to do something.
Nishant: Yeah. It's interesting you mentioned the Facebook issue with Cambridge that happened in 2018. But some of those decisions, some of the fixes were already in place by the time that issue came to light. GDPR was already in the works at that moment. People often think of coinciding events that have associated causality to them. I don't think that's the case here. What brought us to this moment is, strategically, this realization that fundamentally the United States has a very different understanding of privacy holistically as a concept, as a construct, and as a gesture than some of our more European partners.
In this industry, I'm often on these panels where the biggest disagreement on contextual gaps tends not to be between the engineers and the attorneys. But they tend to be between the US council and the EU council in terms of the granularity, the understanding, the detailed structure of nature of privacy obligations, et cetera. For a long time, we have always believed that on the American side, the more data we have, the better it is for everyone. The customers don't need to make too granular choices. They don't have as many tunnels.
Can you imagine if you have to answer 15 different questions before a website actually materializes for you? People get upset on the phone when they have to validate themselves with their bank. They’re expecting like you just want to get into your account so somebody else can get that kind of money from you. Those security measures exist for a reason but people also are extremely impatient.
This is not just about the companies. What this enables us to do is a sense where you can give customers some choice and at the same time, enable the business to be more agile, as well. We were getting to that moment for a long time. How do you make sure that the customers have a clear view of their data? How do you make sure that companies have governance in terms of data collection before something bad happens? Because everybody worth their salt tends to have a decent incident response team. Nobody wants to be the person that doesn't know how to respond when something bad happens. What you would do before, determines the stuff.
As an example, let's assume you make a stupid privacy decision and data ends up in the hands of the wrong code party vendor. You go back to, should that data have been collected? Should the data have been logged? Should the data have been captured by some other database? Should a query have been able to access that data? Should some API have been able to send their data externally to the company? Should we have a higher rate limit to their API?
There's a bunch of micro-decisions that get made before something bad happens. GDPR was about saying you need to have some sort of governance structure. That was not waiting for some breach or some privacy incident in someplace. It just so happens on the calendar perspective, that in May 2018, GDPR became the law of the land, so to speak. At the same time, we had the congressional testimony that I think you're alluding to right now. But I feel like GDPR and the motion towards privacy regulation were in place well before that.
Chris: That was kind of my perception but in the public's eye until GDPR would have a two-year ramp in, people didn't hear about GDPR until websites started popping up cookie consent forms to say, do you really want to do this. There were some websites that just started blocking EU visitors entirely because they didn't want to deal with it, trying to identify how we deal with consent and how we deal with data. It was just easier to say, we don't want you as a customer. We don't even want you touching our website.
Nishant: Yeah. This is where sometimes these laws can be counterproductive if you aren't prepared for them. Simply because a lot of these small companies just don't know what to do. I feel like, if you have enough of a privacy protection posture within your enterprise, it becomes easier to relate to the customers in a way when you can say, yes, we are okay from a legal perspective as well, if your attorneys sign off on that.
Let me give you an example, you give a very good example of companies blocking EU users. I used to go to a gym three years ago. What happened right about the time GDPR was passed was until then, I would sign into the kiosk at the front door. I would get my badge in. Essentially, all I then had to do is get to the treadmill, or an elliptical, and enter my code, and they would log me and all my previous workout histories, my previously saved workouts would all be there. I didn't have to reconfigure anything out. I would just say, workout number one. I would run for 60 minutes, off to the races I go.
After GDPR, I'm not sure how that decision was made, but I have to sign in with my badge at the kiosk upfront. Then I had to re-sign in to the treadmill which meant I had to enter my username and my password. I didn't know this was going to happen. I picked my first name and my last name concatenated as my username. Which if you know my name is not easy to enter in the morning when it's cold.
The customer finding this aspect fell by the wayside because the gym was taking a very conservative route to GDPR. My hope is that three years in, there is enough understanding, enough universality, enough application. Thankfully companies are matured enough to the point that we can have some consistency in terms of what we do in a given situation.
Chris: Got you. Were there some big mistakes that companies have made that have formed decisions here? How do we need to start designing privacy? You and I have been in IT long enough that we're back in the days where the internet was not built with security in mind. All connectivity is trusted and then at some point, we realized, oh, there are bad players. We need a design for security. Security was not initially by design, and I think we got better about having security by design. Is it the sort of the same thing going on with privacy that there are these big mistakes that have happened that have like, oh, we need to watch out because of that.
Nishant: Absolutely, yeah. One of the changes that took place in the early 2010s was this access revolution. To your point, all connectivity is good, was the phrase you used. One of the paradigms, when I was a programmer at Intel, as if I needed access to anything, I had to figure out how to get access to it. It was not granted by definition. I had to make a business case where I had to enter multiple passwords to get access to what was known as Intel's […] Cookbook. Which were the formulas that Intel used to build out as a test strip. I was one of the engineers on the advanced design team and I had access, and I just have to budget 10 minutes to get in properly.
Fast forward five, six years all access was open. Anybody could spin up an S3 bucket, not at Intel, another company. You could spin up an S3 bucket, put data, have an API, and ingest data. You could create copies of data. The goal was to ensure fast movement. This is when we also moved from these structured key-value pair databases to the unstructured JSON Blobs, Cassandra, and MongoDB, which meant that customers wanted access to the app quickly, which meant engineers wanted access to operational data quickly, which meant the data scientists at the backend wanted access to queries quickly, which meant they could understand exactly what to do next for the customer. Velocity and throughput became the lay of the land, which meant that let's not gate any access. Let's build whales, not walls, was the phrase we used back then.
To your point what meant was some people got access that they should not have gotten. Of course, any mistake that you look at in Silicon Valley in terms of companies including some of that I may have worked for comes back to the fact that neither too much was collected, or too much was accessible. You want to figure out a way where you manage access that respects people's legitimate need for access, where the customer and the engineer both benefit. But at the same time, there is some sort of accountability to the access where you can get access if required, and you can audit access once access was granted. In case somebody behaves badly, you can understand what was done why it was done, and you can understand why it was done and you can make sure that doesn't happen again.
That pivot was a painful one because a lot of engineers got used to this idea that it's always going to be this way. A lot of these engineers, you have to remember, joined the workforce after 2009, 2010. Lehman Brothers fell, the economy contracted, and then we had this data revolution where a lot of folks joined the workforce understanding that stock prices always go up, data always comes in, access is always granted. These qualifiers for security and privacy came much afterward.
We're talking about a societal systemic change. The technical controls we're building are meant to essentially remedy some of those misconceptions that simply don't scale anymore. We simply cannot live in a world where everybody can access whatever data they want to. No questions asked. That doesn't work ethically. It doesn't work technically. It doesn't work scalability. It doesn't work for security. It doesn’t work privacy.
Chris: Do you think also some of the reasons that this total access to data was just a matter of it's just easier? I don't want to have to think about all building access control lists. I don't want to have to think about the business cases. There are vendors. We trust them. Let's just give them access to whatever they need so they can get the project done for us.
Nishant: I think it was the Silicon Valley who was the tech version of the Go-go 80s. I'm not going to use Michael Douglas’s quote from Wall Street because it would be too on the nose. Some of it was about just good faith. Every engineer, including me, when I wrote code back in the day believed that they are always ethical. It's always the other folks out there. They are the bad ones. The problem is, it's not my malintent that makes me dangerous. It's my clumsiness. It's my forgetfulness. It's my carelessness where I create this data store someplace. I write this API, I make it available to everyone and somebody else is going to use it for some purpose other than the one I wrote it for. That’s the risk factor.
There was also this phenomenon where people just didn't know what this data meant. Essentially, there are research studies out there that show that a combination of your online data makes you a lot more accessible than even your fingerprint. When I became a US citizen, I had to give 10 sets of fingerprints to the US government before they'd let me become a citizen. You don't need anything nearing that to identify me with my online data.
Senator Romney from Utah was identified. He has a Twitter account called Pierre Delecto. He used that account to make interesting comments that he wouldn’t make as a US Senator. A journalist was able to identify that data within a couple of hours just based on who was following him, who he was following, et cetera. When you have this preponderance of data available, people didn't foresee how identifiable that can make somebody just based on easy access of the data. Privacy is all about don't identify me, don't track me unless I give you consent. If my data is already out there, the cat’s out of the bag. People just didn't foresee the potency of unmitigated access as well as the danger of combinations of data that are out there.
Chris: I remember reading a study that some college had put together, someone had put together where they were able to look at who you're connected to on Facebook with reasonably publicly available data and be able to figure out your gender preference based on who you knew, and who they knew. They were like, oh we know the gender preferences of these people even if they haven’t publicly discussed it. As it was like, oh gee, we don't realize how easy. I think in general, we don't understand the implications of our own data. Even as consumers, we think, oh, that's a trivial piece of information that can't be used for anything. But in combination with a bunch of other things or few other things, it can.
Nishant: We disperse our data when we want our data as customers, we think of the most eminent use cases, like getting access to something, getting that free subscription, or getting that gift card. It's very similar to the student union building on graduation day when people were signing up for those credit cards to get that free pizza. There's no pizza anymore. There's just the pie of tools and apps that people have online. Unfortunately, customers are still signing up for that stuff.
I really want to be very careful. I don't want it to seem like I'm blaming these customers because we have to wait for the internet based on fast movement. I'm a fast app. I have a fast need. When the pandemic hits, I want to be able to get my groceries without going into a grocery store. We are expecting, as customers, the market to adapt to our needs. The market expects customers to remember what they sign up for. These are incompatible requirements.
Customers don't have the time to understand these details, but at the same time application developers also have families to feed. They also have the need to build out apps to compete in very high visibility, high touchpoint marketplace. I feel like, yes, everybody's made mistakes in the past. I think the only way we're going to get to a better place from a privacy perspective is to keep building out the tooling, keep building out the connecting tissue between different teams that tend to be very siloed that we should talk more about how these companies have to build these tools, and essentially making sure that there is this any exchange of ideas between the regulators on the one side and the innovators on the other side because otherwise, we're going to have this phenomenon of talking past each other where we will have a ton of laws, a ton of tools, but the customers are no better for it. Which I want to really point out here.
Chris: To bring this down to things that, let's say, we're not an engineer at Google or a Fortune 500 company. We’re an engineer for a mom-and-pop company that's got 20, 30, 50 employees and we're building out some web applications or websites. We have data coming in and data going out, what are some of the got you’s, where people make mistakes? You talked about the carelessness and the clumsiness. What are some of the things that, if you all get out of this episode and you're not engineers, you need to look for these couple of things. Not to say that those are the end all be all. What are the most common things that you've seen in your experience where people are missing the mark?
Nishant: Perfect. I think I can narrow it down to the top three things you should definitely do if you're a small company. Particularly, companies that are small that don't have the headcount, the big legal teams to essentially deal with privacy issues waiting to happen. That's who would really benefit from the context we're talking about here. Chris, the first instance is to make sure that you know what you're collecting.
Think of a data collection as being this horizontal funnel. We narrow in on the left-hand side and the end on the right-hand side is much bigger, it gets broader. The data you collect grows once it enters your system. People make copies of it, people infer things from that data to the point you made about being able to understand who somebody is based on a bit of data that they have given you. The size and scale of data grow once it enters your system. Understand at the very early point what you have. Try and build technical controls to detect, am I collecting SSNs, for example? Am I collecting IP addresses? It doesn't take a whole lot of work to do it at that early stage because the volume of data is much more manageable.
Based on that, you can build an understanding of who's collecting what data? Where is the data going to go? Who's going to access it or what uses are being put to? Most companies tend to do this at the back end of the process when data is addressed. The problem is at that point, the volume of data has grown as has the usage. This means your ability to manage the outcomes and control the damage is much more limited. Try to catalog your data early in the process. It’s like you make plans for your day first thing in the morning. You don't go on an hour-by-hour basis—at least I don't—most of the people don’t. Treat your customers' data the same way you treat your time with respect. That's number one.
The second thing is to lean on tooling to build privacy at scale. For example, the mistake these small companies often make, I have made that mistake back in the day as well, which is they assume that if they teach these engineers how to do privacy right, different engineers will just do it right. The problem is, engineers are not bad people, by the way. I will speak up for them because I used to be one of them, a little biased here. But they’re extremely busy. They have to ship products. They have these aggressive timelines, these Gantt Charts, these OKRs, these company mission statements, the growth targets they have to meet. When they have to make a critical choice between checking for privacy one last time or shipping that one extra feature, an engineer who has student loans to pay, or a promo target to meet, they will always make one choice. Maybe not always but most times.
It behooves you to build some central tools that multiple engineers can use at the same time. Rather than having 15 different engineering teams to lead data on their own, maybe build a central data collection, data deletion service via API that these teams can use and delete their data for them. They can choose. In some cases, you can delete the data altogether. In some cases, you can de-link sensitive data from aggregated nonsensitive data. In some cases, you can obfuscate the data and make it available on a case-by-case basis. But make it available for them. If they have to spend say, four hours or four days to delete data versus using an API and get it done in 40 minutes, you've made that choice much easier for them. They will do the right thing because you've made the right thing easier for them. That's number two.
The third thing is to make sure that this wall that often exists between the legal team and the engineering privacy team is broken down. One of the things we've done very well at Uber here and the legal partners on the other side. There’s a ton of credit for this is that we work in harmony. If they intercept something that appears to have a privacy sensitivity to it, they pull us in, they pull my team in. We help the engineers write better ERDS and PRDs. We bake privacy into the design. Then when the attorneys have to do a review at the tail end, we have the context because we prep them. The engineering product that they're reviewing has the privacy controls baked-in.
Concurrently, if we see something that makes me not seem that great from an engineering perspective but seems especially problematic from a legal perspective, we pull the attorneys in. They provide us the cover that we need to make sure that our privacy controls are adopted. Make sure that the attorneys and the engineers on the privacy side action work hand-in-hand. This is a very cross-functional endeavor, Chris. This notion where you can stay in your silo, and do things right from your perspective, and everybody else can fend for themselves in their work because somebody else's data collection could be your problem, and your data might become somebody else's problem. Those are the three things that come to mind.
Chris: Yeah, I really like your point about the tools for deleting data because I was thinking back to like, gosh, with a lot of entities what is collected grows over time. We've now added this new product, this new service, and we need to collect this additional data point. Now, we've pulled it in. If I was the engineer who designed something to delete data and that data point didn't exist before, it might not get deleted by my routine. Whereas if you've got, this is the process to delete, and somebody owns that, and everybody uses that, you’re assured that it's getting everything and not just some things.
Nishant: Exactly. Just to really double click on that, Chris, it's like when we talk to our attorneys—not at my company but elsewhere—they want a level of certainty that there is consistency. I'm going to pick the example of a retail system. Let's assume you have a retail website. You have a service within the company that builds shopping carts. Your website takes care of the actual shipping of the products. You have another service that takes care of wish lists. It all maps back to the same customer. If each of them has their own deletion service. A deletion might be handled very, very inconsistently.
When an attorney comes and says, this customer has requested that their data be deleted. If there is inconsistency, you might end up with a situation where the attorney says, yes, we were done. Then you find out that one of those services didn’t do deletion correctly or they’ve forgotten their tool or they forgot to check this other database. Whereas if there is a centralized service, you can make sure that the service works the same way across all of the services and essentially deletes the same data. You have a success or failure outcome that is much more manageable.
You reduce this level of redundancy. You reduce this level of risk because you basically have one service to fix at all times. What people sometimes think of is having this central privacy to use is also expensive and it’s going to cost money. Yes, it will in the beginning, but over time, three, four, or five months out, that’s going to be much cheaper because you will prevent a bunch of bad stuff from happening. People often tell me privacy is expensive. I’m like, yeah, it is. But lack of privacy is much more so.
Chris: Yeah. A class-action lawsuit is way more expensive than an engineer.
Nishant: Yeah. Not just to mention, it’s the application damage, so to speak, the trust damage. There are some companies who because they are so interested in our lives may be able to get by and they may be able to fix these issues and come back from that. With Equifax for example, they had this breach and I was impacted by Equifax.
Chris: Who wasn’t?
Nishant: My wife wasn’t. Thank goodness.
Chris: That’s good.
Nishant: She reminds me how lucky she is compared to me. The problem is in Equifax’s case, I have to use them. I don’t apply for loans often, but if you apply for the loan, chances are Equifax will be part of the picture. I check my credit score every month on Credit Karma and they tell me what my Equifax score is. I don't have a choice to back out. But if you had a new company, a CDC or CSB company, what if the VC says, oh my goodness, this is just too risky. I’m going to back away. Do you want to be that entrepreneur right now? Those are some of the choices you want to ask yourself before making. These decisions you make today will come back and bite you at a later point.
On the other hand, you could do this thing from a privacy perspective that could be a great investment to make. If you delete data you don’t need, if you obfuscate data that you don’t need, if you delete duplicate copies of data, you have fewer data to store. That is less money paid to your cloud providers. It’s an economical issue. You have an improved data quality picture which means your queries won’t take as long to run. Heaven forbid, you got breached, the risk of you losing out on data has gone down significantly. Like if you have two copies of the same darn data, the potential of that breach impacting was much higher. There is a business reason to do the right thing, as well. In politics, people often say, good politics might function as good privacy is good business.
Chris: For a lot of people, privacy was really not a thought. It wasn’t part of the design 10 years ago. But anyone who’s building a product now, better dug on to be thinking about privacy.
Nishant: Exactly. I feel like the challenge for people like me is to make the case not just from an altruistic perspective. If you make the right case, if you make the case purely from the perspective of doing it because it’s the right thing, people are like that, but I have these vast important business goals to meet. I feel the stronger business case, the more honest business cases to be made in the sense of how much money you can save, how many more customers you can add, how many more markets you can get into, that is a definitive strong link between business benefit and correct privacy posture.
I feel like that is where the industry needs to go next, where the closer you explain that link, the more success you’re going to have in building a good program and also demonstrating that is the right thing to do.
Chris: Got you. From a consumer perspective, is there a way for us as consumers to know short of data breaches that are disclosed? Are there signs that companies are not handling our data well or the things that we should be watching out for? Things like, gosh, I’m just signing up for a company to ship me eggs and they want to know my social security number. Something like that should be a red flag. Why do you need that piece of data? What should consumers be thinking when they’re dealing with apps and websites in terms of their own privacy? What should they be watching out for?
Nishant: Yeah. I think what you should be watching out for precisely are the examples you mentioned which is, is there a disconnect between the used case for data collection, and the volume of data collected, and the type of data collected? Why would you need an SSN for grocery? You wouldn’t.
You also want to look at data collection and protection practices. If you can get any information about how exactly your data is being stored, how it is being collected, who has access to it. Those sorts of pieces of data can be very helpful. Companies tend to be very good about that stuff, especially when they’re in the public eye, making sure that there are secure MPLS connections, making sure that any sort of collection of data is protected very well by firewalls. Those sorts of things. Making sure that there is obfuscation of data. One thing I also tend to do is, if I’m talking to venture capitalists, if I’m advising a company, I want to see, are they investing in the security team. You can go on their LinkedIn profile and look at the number of employees they have and check, do they have an investment in security and privacy.
What happens is if companies invest in security and privacy after an upgrowth, you can almost be fairly certain that they are trying to bring the security and privacy folks to clean up a mess that has been made during the growth phase. On the other hand, if they invest in this stuff at the early stage of their innovation, that means they are serious. You want to check in that stuff as well and you want to check and see if the company’s investment in security and privacy corresponds to their growth stage. That’s number three.
The other thing you want to check about is, who is the company doing business with. Are they sharing a whole bunch of data with third-party vendors? I tend to look at their DOORS and thankfully because of some of the work Apple is doing in this space, companies are going to have to get out in front and say, here is what we are collecting your data for. Give us consent. I think that has been one of the benefits of GDPR and CCPA wherein albeit, in someone of a clumsy fashion, companies are having to explain for the first time what they will do with their data.
That explanation is not very simple. It takes a long time like Techy credit card statements tend to be very thick. I remember my first mortgage, it was worth 15, 20 pages long. But at least we’re getting to a point where the two sides are talking to each other.
The biggest bit of advice I would give to customers is to ask questions. Read up on the stuff because, at the end of the day, it’s your data. I get that I’ve been working for Silicon Valley and tech companies for the last 15 years but I also have an articular obligation to make sure that I have a responsibility towards the customers as well where I have to be the customer advocate within the company because for me, to build my tools and to build tools that genuinely make a difference, I have to think about the customer behind the data that’s a human being. Somebody who lives someplace, somebody who has kids, who have parents, who go places, who eats stuff, who has an identity, maybe they live in a country where their identity might get them killed, maybe their identity that their activities might put them in jeopardy. What we do with their data affects their well-being.
You want to look at how companies behave overall and then make a decision as well and hold companies that are not. Also, as an employee, I have to think about it through that as well, where I’m making sure that when I talk to intern engineers, product managers, remind them of that perspective. That’s kind of a long answer there but I just wanted to speak about what is a human being from that perspective.
Chris: No, I appreciate and I like that. I haven’t really started to look at Apple’s privacy nutrition labels. Do you think that’s going to be a big shift in what you see from app stores, that you’ll start seeing the same sort of thing in the Google Store that here’s what this app is doing, here is the data it is collecting and here is what they’re doing with it?
Nishant: Yeah. The phrase fuzz more advantage often gets used in the industry. Apple has built a brand when it comes to privacy. I have a lot of friends who work in the Apple Privacy Team, just like our friends who work in pretty much every privacy team in the Valley and they are among the smartest people I’ve known. I believe so and I think that they have put the marker on the ground saying that yes, we want there to be a more explicit consent model.
The paradigm that I’m not sure about how it develops, Chris, is that Apple serves a very specific kind of clientele, a very discernable clientele, people who pay a lot of money for their phones. The argument goes that these folks are going to expect a lot more privacy because they are paying for it. Now, I have problems in a world where you make privacy a privilege for people that are affluent, but let’s just go with the Apple for the time being. Where the expectation is that once Apple does it, one of the two main platforms does it, Google will follow. Then it becomes a customer expectation. I would say even money, yes, there’s a very good chance that’s the direction we go in.
I think the opportunity here is how do you make sure that the handshake is much more educational, where companies have a way to do this in a way that is not overly disruptive to the point where the experience becomes affected. If you have this strong pop up that doesn’t let you go forward unless you checked 15 boxes, I’m pretty sure a lot of app developers, video game developers will have a problem with that. How do you make sure that the experience or the UI customer-centric focus on it? Because the last thing you want on the other extreme is to make privacy this extreme esoteric blocker that’s going to hurt small businesses where they can’t get hold of their customers anymore. The only people that will benefit are the only big companies that allegedly need to be held to a much higher standard.
Getting that balance right is going to be critical. I’ve talked to my friends across the Valley and we’re all trying to figure out how this happens. This is all happening during a very crazy time. In our industry right now, the world is not exactly settling. I wish we were having this conversation in early 2020, late 2019 rather than doing this time. But the cool thing, Chris, is that this is when innovation happens. Some of the smartest ideas come during times of unrest. My hope is all the online data that we’re generating, all this online commerce that is happening will lead to a better reach and understanding of our sure need for each other and a better solution for everyone.
Chris: From my perspective, I run a website. I get lots of people coming into the site. I don’t charge people for my website, but I still do need to be able to make a living. Then there’s this balance of revenue with data that everyone has to work out. Everyone has to figure out on their own in terms of how much money does my app need to earn for me to be able to continue to produce a good app. Do I charge for it or do I do it with some sort of data sharing agreed upon with the user or previously not agreed upon?
Nishant: Definitely. If I had my brothers and this is something I always push my teams to do which is let’s make the first offering. Let’s not wait for an Apple, FDC, or the EU to come forward with something. Let’s basically say, here’s how we are doing it. Here’s how we believe we can find the right balance between data collection, monetization, and personalization on the one side and the word privacy data management on the other. Let them be open for school. This is what engineers do when they open source their stuff. They source it. It belongs to the community, and the community then adopts it. It becomes something that is used significantly and is improving.
Let’s open source privacy. Let’s do it that way. Let’s make sure that the regulators or legislators are part of the conversation. Because I’m fairly certain that at the end of the day, people will genuinely care about privacy in the US and elsewhere, they want to do this right as well and they need the engineering support that is available in the industry right now.
Chris: It’s one of those things that I’ve always struggled with, politicians creating laws to deal with tech when they don’t understand the tech. But I think this is one of those issues where we’re talking about if the industry leads and says, hey, this is what we need to do to protect user privacy. Okay, legislatures, here is what we need to protect, and here is how we need to protect it. As opposed to, the lawyers on the way down not knowing what it all means.
Nishant: The iconic moment in 2018 in the congress testimony with Mr. Zuckerburg was when somebody in the US said and asked him how do you make money? The question was, we offered Facebook for free and he said we did ads. People thought that was funny. I’m like, that’s not funny. The disconnect between the people who build such incredible service at scale and I use Facebook aggressively for animal rescue. The only thing I care about at the time is engagement. I want to make sure that as many people as possible see this so we can get this dog out of this high kill shelter. From my perspective, it’s an incredible service. On the other side, I also need the government to be engaged and write a law that can be meaningfully implemented. This gap is not helpful. We need to narrow it somehow.
Chris: Have you found that same gap even within organizations where if you’re an engineer saying, oh my gosh, we need to do a better job with our privacy, how do I communicate with the C-level who’s like, no, privacy is not important.
Nishant: I have yet to run into a C-level that’ll say that privacy is not important. I think making sure that it contextually makes sense to them is a different challenge. How do you come up with metrics? How do you assess what you will do? How do make sure that you quantify risk? How do you multiply the likelihood and the impact of risk and how do you demonstrate that what you’re doing has moved the needle? How do you demonstrate coverage of your privacy towards that? How many services have onboarded the services that we build for leisure, for export, for preservation? My teams actually build those services and it’s easy for me to demonstrate these many teams across the community. I’ve onboarded them. These other teams have not. Hey SVP, can you please help your teams clear prioritization for it. That’s how you do it.
One thing we also do on my side here at Uber in partnership with legal is, we write privacy policies and governance requirements within the company that are written in partnership with engineering. For example, what tools can we use for third-party data sharing? That enables us to onboard the right tools for third-party vendor data sharing. As an example, if I’m sharing data with a third party entity that does not have an SFTP server, we can put it on a cloud provider that has the appropriate security protections. That means we’re buying that tool for the right reason. We have a reason to put their data in there. It’s quantifiable. It’s auditable. We can set up a password-protected link. We can delete the file once the access is complete. We can audit that the access was in fact correctly done, which goes into our third-party audits, which means we can demonstrate compliance, which means we can get into a specific market.
I can tell that story with the board all day long because the benefit of preventing a privacy and security risk also has a business benefit as well. Getting that context on a tool slide deck is the real challenge because there’s a lot to do here. How do you pick your targets? My challenge is not a shortage of challenges. My challenge is the shortage of time.
Chris: Yup, that’s good. Can you tell me more about your book and why you wrote it?
Nishant: I started writing this book because of the challenge you mentioned.
Chris: What’s the name of the book?
Nishant: The book is named Privacy by Design and it’s aimed towards engineers, technical program managers, and senior architects in small businesses that are maybe 30-50 people large that need privacy guided and how they implement privacy in their company. How do you make sure that you do privacy right from the get-go? How do you make sure that you are doing it right from the beginning so that you don’t have to clean up all these debts at the back end?
My recommendation also is to have CEOs, people in media, people like yourself, industry influencers, people who deal with data collection, data monetization, people who deal with regulation, people who deal with the government, read this book as well because it will give you a sense of what it is like to implement these privacy controls. It is entirely possible that you understand only 30% of what it means to anonymize data. It is possible that you only understand 40% of what it means to catalog and classify data. But you will understand how the data journeys in your ecosystem, how do you collect it, how does it fall into the data pipeline, how does it get sorted to the warehouses, how does it get onboarded across the board into third-party vendors et cetera, how do you measure privacy, that’s going to give you a fuller view of why privacy is hard.
It will give engineers an understanding of how they should be building services so that onboarding privacy to their services downstream becomes easier. It will give the CEOs, CSOs, CPOs, and CSOs a much better exposure of how to staff their teams, how to invest correctly. It will also give people an understanding of how to build privacy tools. There are people like OneTrust, BigID, people who build privacy tools. It’ll give them an understanding of what the pain points in the industry are. Yes, I’ve written this book Privacy by Design, primarily for small businesses and medium-sized businesses. But I feel like if your livelihood has anything to do with data, I think this is a great book for you because it will help you make decisions on what to do, and what not to do from a privacy perspective, engineering-wise.
Chris: Is there value for consumers as well, just to understand how companies deal with their data?
Nishant: Absolutely. I feel like customers really understand what happens to their data. My mom and my dad often tell me—my mom especially because she’s very tech-savvy. She’s like, I don't know what data gets collected. I don't know what these companies are going to do with my data. When I used to work for Google, she routinely asks me, how does Google make money? Whether you are extremely smart and savvy like my mom or you’re like my dad who doesn’t really care, I think this is going to be a great book for you to understand how the world has changed, how a very simple linear way of building tools has become so data-driven, the choices that companies make, the decisions that they have to make, and how you as a customer can make life easier for yourself by making smart decisions on the front end.
Chris: Awesome. We know as of the time of this recording, the book is not available to the public yet. I think the publishers are giving an opportunity for people to get it electronically at a price, but we’ll also link to it on Amazon once it’s available. Here is where you can purchase Privacy by Design using the promo code podeasyprey21, before it's available on Amazon.
Nishant: Definitely. If I may, one last push, Chris, if people want to find me on LinkedIn, I’m always looking for smart engineers who work in security and privacy to help build some of these tools I’ve been talking about. If you are somebody in this space who’s looking for an opportunity or who wants a network, please connect with me. I’m always happy to add to the conversation and learn more from people as well.
Leave a Reply