Why Your Decentralized Data Practice Isn’t Scaling

The Rise of Departmental Analytics

Nobody likes to be beholden to others. After all, it feels good to be empowered; to get things done on your own. The ability for virtually any business user to stand up a cloud application has resulted in an enormous sprawl of SaaS software, and those apps all contain silos of data just waiting to be blended with other information sources. But why, you ask? Because blended data paints a broader picture of business value; one whose comprehensive whole is greater than the sum of its parts. A prime example being merging of customers’ product usage with their support ticket history to gauge cross-sell / up-sell propensity.

Many companies, typically of the technology startup variety, start out with a product offering that generates some amount of product data. As new departments such as finance and sales operations come online, logging of basic user telemetry must mature in order to facilitate business demand for key metrics such as the tracking of monthly active users (MAUs) or the prediction of average rate of return (ARR). This is the pivotal moment when data is no longer needed by just one department, but instead, many.

Whether we call an analytics practice departmental, decentralized, or democratized, the outcome is the essentially the same. That is, a silo of operational data is generated primarily by one department, yet that same data needs to be consumed by other departments as well. This leads to the data producing department having to entertain many different requests from the various data consuming departments. Finance needs the data in a more timely manner, whereas marketing wants to know what time of day users are leveraging a particular product feature. Sales wants support tickets merged with their CRM account database, and customer support wants to push common support requests to social media and community-driven channels. The list goes on and on.

The Perils of Decentralized Analytics

Data producing departments overwhelmed with data consumption requests will typically “resolve” this issue of astronomical demand in one of two ways: (1) by simply telling the requesting department to wait, or (2) dump the data into some type of “self-service” data lake; a virtual buffet of raw information free for the taking; that is if you know exactly what to look for!

Even in the best of scenarios where data is reliably syndicated to a secure, centralized repository for general corporate consumption, the real issues start to emerge on the data ingestion end of the equation. Namely:

Everyone is reinventing the same wheel – Individual departments must build their own tooling and/or web services to consume data. Departments must then invest in data warehouses and reporting tools. This equates to extensive technology duplication across the company.

Everyone has a different perspective of the data – Without standards for metrics, there will never be consensus. That means different departments, or even different analysts within the same department, will come up with different calculations for the same business metrics. For instance, a sales analytics team may count bookings as a “sale” whereas the finance analytics team clearly would not. The product team may consider every product consumer (including free-tier and trial users) as “subscribers,” whereas executives may want only paying customers counted as MAUs for Wall Street analysts’ consumption.

Operational instability – With departmental data, every little microcosm of data is managed individually. Consequently there are varying degrees of operational rigor applied to these fragmented data operations. While the more mature teams may have separate analysis and operational data engineering functions, other teams may have only analysts and data scientists to run the numbers and the platform, which is both expensive and exhausting for the team members.

Consultants ‘R’ Us – Whenever a department decides to handle analytics on its own, expensive consultants are never far behind. While there’s no shame in leveraging a consultancy to build out an analytics practice, there should be an enterprise-wide standard or at least a strategic consulting strategy around data solutions. Otherwise, expect to see a data warehouse managed by vendor-X in one department, and a completely different solution managed by vendor-Y at the department next door. This lack of information federation only hurts the enterprise in the end.

Data Politics – This is perhaps the ugliest peril of all. Ultimately, politics arise when data priorities are misaligned between departments. For example, Finance needs rollup product metrics on an hourly basis, yet the product engineering team is “too busy” to help given other commitments. Finance then has to go build their own data warehouse as a workaround. Another general indicator of this phenomenon are stall tactics; for example when the data producing group makes the data consuming group jump through hoops, wait ridiculous amounts of time, or make outrageous “business justifications” for data access. It is in these trying situations when the data consuming department decides to do everything on their own and bypass the data producing team entirely.

Getting teams aligned on data never happens overnight. Therefore the transitional state should be one of a hybrid mode whereby governance and autonomy are properly balanced.

Solution Paths

The solution to departmental data teams isn’t necessarily data centralization, nor is it a centralized BI practice. While a dedicated corporate BI function is has merit, the hybrid approach of balancing centralization of some resources versus democratization of others can work quite well.

For this hybrid approach to succeed, there must be enterprise-wide consensus on how metrics are measured. Secondarily, certain solutions must scale to become shared services. That means the solution itself must not only be stable, but it must be managed like a product with multiple customers that span a diverse range business requirements.

Process and Agreement

True cross-functional data practices start with agreement around metrics. A metrics certification process must be initiated– at least for cross functional metrics of interest– so that everyone is in fundamental agreement on what’s being measured, and how. Specifically scoping hand-picked metrics will also assist in prioritization so that certification work is properly focused.

Once the process for deriving an enterprise metric is certified, it’s time for other users to adopt those metrics. But in order for others to adopt these “golden metrics,” information consumers need to be aware of them in the first place. Such awareness campaigns around new enterprise measures doesn’t happen with just one internal marketing channel. There needs to be a data community where the metrics are socialized, a standard portal for certified metric consumption, and of course routine communications that educate new and existing employees of the data itself as well as the process of certification.

Reusable Solutions

There will come a point in time when a departmental solution must either be scaled or sunsetted in order to meet the broader needs of the enterprise. This could mean choosing to mature the finance department’s pet project in data warehousing to accommodate additional data marts for sales and marketing, or perhaps investing in a dedicated data engineering team within the product development organization. Specific areas that lend themselves to shared services include:

Core Data Platform and Infrastructure – The core data platform itself must be a shared service since its inception, and must have a multi-departmental use case in mind from day-one. The core data platform is where data ingestion occurs (think big data product logging tooling such as fluentd, Kafka, or Amazon Kinesis) and is where stability and security are a must. Of course the shared solution doesn’t stop at the technology layer. Data operations (specifically staffing and operational processes) must be part of the equation as well.

Data Integration Services – If the core data platform handles ingestion and centralization of enterprise data, getting it to the target locations is where data integration comes into play. From basic data moving pipelines to extract-transform-load (ETL) and enterprise service bus (ESB) frameworks, these solutions are all about getting data from point A to point B with transformations and calculations happening along the way. This tier should be scaled into a shared solution given this is where much of the formerly agreed up business metrics are calculated. In other words, this tier is where the actual enterprise metrics calculations are applied.

Enterprise Data Warehouse Infrastructure – Once data is collected and massaged accordingly, it needs to be placed in a final destination for processed (versus raw) consumption. This can be a refined data lake, or more commonly, a relational database table such as an operational data store (ODS) or subject-oriented data mart within a data warehouse. Data warehouse reusability is relevant for several reasons. First from the infrastructure standpoint: it makes sense to centralize data marts (think facts and dimensions) under one logical roof that’s properly managed and secured with enterprise mechanisms. Secondly at the logical data level: sharing facts and dimensions across departments can be easily achieved, even when those facts and dimensions are managed by different teams. The key is building a contract between those teams via interdepartmental data communities where standards are set and data agreements are made.

Reporting Infrastructure – Reporting is typically where one will find an immense degree of variation, and rightfully so, as needs across departments are incredibly diverse. Such diversity stems from an array of factors such as traditional BI versus data science methodologies to operational versus analytical reporting needs. Consequently, it’s not uncommon for an enterprise to boast a half dozen or even more reporting tools. As a shared service, one or two reporting tools can be standardized for general, corporate-wide usage. However, these standard reporting tools must absolutely not preclude the use of other tools by other teams.

Summary

Absolute data centralization is practically unheard of, which means that every company must strike a balance between shared data solutions and departmental autonomy. Finding that balance requires taking a strategic look at the various processed and existing building blocks, then deciding what should either scale or be compartmentalized.