Sunday, August 06, 2017

A Modular Approach to Solving The Data Variety Problem

I have been working on a paper in my spare time on the weekends for a number of months now.  My goal with this paper is to change the thinking around data and ultimately bridge the chasm between how IT and the Business think about data management and in particular Data Warehousing and Business Analytics.

I am publishing the full paper here as a PDF and will publish portions of this paper in piecemeal over the coming days and weeks beginning with the Executive Summary.

I encourage readers to share this paper and discuss the ideas contained within.  I also encourage readers to send their feedback.  Since I am a human being and am as sensitive to criticism of my work as the next person, I only ask that you couch any negative criticism in a way that is civil.
Based on feedback, I may create new versions of this paper which you can easily distinguish by the paper's AS-OF date.

Before I sign off,  I would like to thank Jane Roberts for her time in reviewing this paper and for her contributions. Thank you Jane!

Here is the paper:
A Modular Approach to Solving The Data Variety Problem AS-OF 207-08-06

Executive Summary
Big Data has fully captured the popular imagination. Companies like Google, Facebook, Apple, Amazon, and Microsoft process petabytes of data daily. Limits that once seemed impossible are now the new normal. In spite of this, analysts and managers still struggle to answer unexpected questions at executive speed.

The reason is that while much attention has been given to these remarkable data volumes, a different but related problem has come into sharp focus: The Data Variety Problem. Namely, organizations continue to struggle to manage and query the ever increasing variety of data originating from sources including, but not restricted to: IT controlled systems (e.g. ERPs, PoS systems, subscriber billing systems, etc.); 3rd Party managed systems (e.g. cloud CRMs, cloud marketing DMPs); and Business controlled departmental tracking spreadsheets, grouping lookup tables, and adjustment tables.

The approach to locating, obtaining, and integrating these sources of data is highly manual. Case-in- point: In the Alteryx commissioned study Advanced Spreadsheet Users Surveyi, published in December 2016, IDC discovered that $60 billion [is] wasted in the U.S. every year by advanced spreadsheet users.” Yet this report only provides a small glimpse into this problem and misses the bigger opportunity: Organizations urgently need a one-size-fits all ‘happy path’ for consuming and producing an ever accelerating increase in the variety of structured data.

In this paper drawing on in algebra, computer science, systems thinking, history, psychology, and 20 years experience as a data practitioner Neil Hepburn posits that the best approach to addressing this problem for the long term is through embracing a modular data warehousing system. Neil goes on to describe how such a modular data warehousing system could be designed and built using readily available tools and technology, and what challenges must be overcome to realize this vision. 

A Modular Approach to Solving The Data Variety Problem AS-OF 2017-08-06