Building a Privacy-Compliant CDP

Digital advertising and targeting aren’t going anywhere. We simply need to be more transparent and responsible in the handling of personal data.

Privacy Law Impact on Martech

In May of 2018, several countries within the European Union signed into law the General Data Protection Regulation; also known as GDPR. I’m neither an attorney (thank god) nor a privacy professional, so I’ll leave it to you to read up on all the gory GDPR requirements. Yet in a nutshell, GDPR aims to protect consumer data by giving “data subjects” (that means people like you and I) the ability to ask corporations what they’re doing with our data, inquire on whether the data they store is accurate, and perhaps most importantly: GDPR gives us humans the right to be forgotten. That means, we can knock on Facebook or Google’s door and say: “delete everything you store about me.” At least that’s the intent.

As the United States has not (yet) enacted a GDPR-equivalent law, several states within the union are coming up with their own local laws as a stopgap measure. California has the California Consumer Privacy Act (CCPA), the state of New York has senate bill S5642, and Nevada passed senate bill 220. It’s becoming quite the patchwork of policies out there, and maintaining sales and marketing systems that house consumer data in good faith is getting more complex by the day. 

In theory, this is a huge win for consumers. In practice, implementing compliant solutions is a heavy lift for most companies. Aside from policies that vary state-to-state, the various data processing systems like advertising engines, customer relationship management (CRM) tools, and website trackers are inherently disjointed. Stitching together the bits and pieces of tracking data across multiple systems is difficult to say the least. 

Never Waste a Crisis

For some, all this privacy regulation may seem like a crisis; a proverbial rain falling upon the marketing and advertising parade left largely unencumbered by regulators for years. 

Ironically, this compliance goal of monitoring customer activity across systems has actually been the holy grail of digital marketing managers since day-one. Enabling all sales, marketing, and support systems to share the full picture of customer activity makes each system exponentially smarter, as each system now has far greater situational awareness of both customers and prospects alike. With a little give-and-take, we can actually use these privacy regulations to our advantage. That is, we can build marketing systems that are smarter for businesses and safer for consumers at the same time. 

The Technical Building Blocks of a CDP

This type of system goes by many names: a “customer-360” database, a customer data platform (CDP), a customer data lake, and so on. Having designed several instances of these solutions, I’ll provide some tips on design decisions, technology choices, and pitfalls to avoid.

Define Your Near-Term Goals

To kill two birds with a CDP– meaning to satisfy both business and compliance goals with it– a short list of compliance and business milestones should be established. An example 18 month roadmap could look something like:

  • 6 Month Milestones
    • Deploy capability for consumers to validate their data across all sales and marketing systems (Compliance)
    • Enable marketing teams to associate retroactive clickstream data to named users (LeadGen)
  • 12 Month Milestones
    • Deploy a compliance portal for consumers to manage communication preferences (Compliance)
    • Build lookalike audiences from existing customer bases for Google Adwords campaigns (Digital Advertising)
  • 18 Month Milestones
    • Enhance compliance portal to manage cookies as well as communication preferences (Compliance)
    • Feed sales CRM account objects with ticket statistics from support tool (Customer Success)

Keep the goals SMART with emphasis on specific and achievable. Do not try to “boil the ocean” by providing everything to everyone on day-one.

Choose Your Data Wisely

Let’s talk data for a minute. Specifically which data you need to make your customer systems smarter. 

Some folks believe in recording everything. Record user click streams. Record mobile activity. Record user location. Record, record, record! The rationale for this approach I hear over and over is this: “We don’t need the data today. But we may need it someday. So let’s record it!” It’s the build-it-and-they-will-come approach, which some people incorrectly refer to as building a “data lake.” I myself prefer to call this phenomenon a data landfill; a wasteland of raw data which slowly kills actionable insights by burying information under its own crushing weight while also creating enormous security and compliance risks. As noted in the prior section, defining your near and mid-term (think 6-18 month) goals will help to define the data to collect today. 

When thinking about data collection, you’ll need to clearly map the data to business processes and systems. Beyond common sense, this is a compliance requirement under multiple regulations. (You must document why data collection is necessary in order to conduct specific business processes.) Using the simple roadmap outlined in the prior section, one would need to obtain the following data:

Source SystemRaw DataEventual Metrics/Insights
Websites– Click streams
– Content downloads
– Key page visits
– Lead scoring
– Audience building
– User journey mapping
Mobile Apps– App downloads
– App activations
– Activation rate
– MAUs
– Churn rate
SaaS Solutions– CRM account data
– Order history
– Support history
– Retention

Application Reference Architecture

Shown below is a loose representation of a reference architecture I put together for a recent client.

In this architecture, Amazon Athena is the data warehouse virtualization layer. While there are plenty of alternative solutions out there, from Oracle to Snowflake, this client simply preferred to work within a predominantly AWS-centric ecosystem.

Essentially, raw data is collected from source systems, and stored within an S3 staging area. From there, batch and real-time ETL routines massage the data into a target data lake format. Complex analysis by business intelligence and data science teams may then be achieved through Athena’s query interface or via JDBC connections for tools like Tableau or AWS QuickSight. 

CDP Data Model 1.0

In this very simple example, Fred and Wilma Flintstone are customers. Betty Rubble and Stoney Curtis on the other hand we know of through one or more forms of marketing automation (perhaps a blog signup or whitepaper download). However, they haven’t converted to customer status yet. Then there are two other users: user_id 9120 and user_id 4983. We’re tracking what they’re doing across our web (and potentially mobile) properties, however we’ve yet to “unmask” the users. In other words, they’ve remained anonymous; lurking throughout our corporate properties without telling us who they are via a content download or product sign up. 

The green fact table maps who-does-what-and-when over time. This very trivial 4-row table tells us that:

  • Anonymous user_id 4983 returned to our website for at least a second time on January 2, 2020. 
  • Wilma Flintstone downloaded a whitepaper on January 3, 2020
  • Stoney Curtis visited our pricing page on January 4, 2020
  • Anonymous user_id 9210 visited our contact page on January 4, 2020. 

The combinations of fact-to-dimension table associations is endless. (Device to contact, contact to journey, etc) Again, deciding what tables to build in the next 6-18 months should be business goal driven. 

As you can see in the star schema above, we have some PII on our hands. Effectively whatever that table touches becomes in-scope for GDPR, CCPA, and other privacy-related regulation. That’s because the consumers of the green fact tables (typically other marketing systems) will likely have access to PII stored in the pink dimension tables.

Linking processes to systems, and system to data for compliance gets a lot easier when the data is under well-governed (and well documented) under a lock-and key approach such as this. While we’re liberal in our sharing of data among internal marketing systems, we’re strict in the data we share beyond that.

Master Data Management (MDM)

One aspect of this architecture I’ve conveniently glossed over thus far is master data management, or MDM. 

The ability to track subjects– be it users, devices, or machine API consumers– across multiple systems requires a complex orchestration of key-based matching among sources of truth and systems of reference.

While the scope of MDM is well beyond this article, thinking about a centralized location for human IDs; both for users and anonymous personas is an important foundational item that should be addressed sooner rather than later. While I wouldn’t make the deployment of a full-blown MDM capability a prerequisite to a CDP solution, having a skeletal MDM solution (e.g. a source of truth for all human and device IDs) will really help in the long run.


The alphabet soup of martech acronyms is staggering, and to make matters works, many of the solutions offer overlapping functionality which makes neat-and-simple classification of tools practically impossible. That said, here’s my attempt to quickly disambiguate where these seemingly similar tools play.

  • CRM – A customer relationship management system. Examples include Adobe Marketo, Hubspot, and Salesforce Sales Cloud. CRM systems have been around for many years, and traditionally house a hierarchical collection of accounts (which typically represent organizations), contacts within those accounts, opportunities, win/loss opportunity status, contracts, and a myriad of metrics such as sales forecasts, account executive sales quotas, and so forth. In terms of size, a typical CRM may house dozens to hundreds of thousands of contacts which are tied to tens of thousands of accounts. 
  • CDP – Customer data platform. A data platform that stores multiple dimensions of customer data, typically sourced from multiple systems such as CRM, website trackers, and customer support tools. The data is extracted, transformed, and optionally enriched. It’s then stored within a centralized database which can be used for system operations (such as website personalization or ad targeting) as well as analytic research. CDPs are often very large, easily housing millions of records. 
  • DMP – Data management platform. A DMP is typically used in conjunction with an advertising demand side platform (DSP). Put simply, a DMP is a key-value store used to associate anonymous IDs with audiences. For example, may have a DSP and DMP, and you may be known as user_3012012as21l_3 to that DMP. From there, your user ID may be associated with the audience “young_working_males-21-30.” Given this demographic, CNN may decide to show you a Tequila advertisement, a Ford F150 truck ad, or some other product they feel has affinity to your demographic / psychographic profile. One important detail on DMPs is that they rarely store PII. In fact, storing PII within a commercial DMP from a vendor like Salesforce (Krux) or Oracle (Bluekai) may breach your contract. 

So which tools do you really need? It depends on your business model. If you’re running a B2B business with a sales team leveraging account-based-marketing (ABM) protocols, then you’ll almost certainly require a CRM. If your business has a vast digital footprint; be it for marketing or core product purposes, you’ll probably need a CDP as well. The DMP, however, is only needed if you’re very hands-on with your digital advertising. If you have an agency advertising for you, or if you simply rely on Google and Facebook default tools for advertising, you probably don’t need a DMP. 


Building marketing and AdTech systems doesn’t need to be at odds with regulation. Use compliance laws to your advantage! Because following the law is mandatory, companies must build marketing and advertising systems that are compliant. Therefore, use those dollars to build better systems; systems that improve business goals while simultaneously protecting consumer data.