This post is a follow-up to my original posting (and paper) titled “A Modular Approach to Solving the Data Variety Problem”.
In response to that posting a LinkedIn commenter (Mark B.) asked the following [paraphrased] question to understand how he might use modular approach to build a modular data analysis system to handle the following scenario:
“As a digital marketer, I would like to see how the variation in advertising images are related to responses by different audiences.”
Thank you for this question Mark. Since you have identified two subjects: Images and Advertisements, this is an ideal jumping off point to illustrate the benefits of taking a modular approach to analytics.
To give you the short answer, using a modular approach we can ask and answer cross-subject questions that would normally be prohibitively expensive to answer:
- “What images give me the best click-thru and conversion rates?”
- “Do older images have the same click-thru rate as newer images?”
- “Including the cost of image production, what is the overall cost of my Ad Campaigns?”
- “Is there any relationship between the cost of an image and its click-thru and conversion rate by gender?”
- “Do images with a positive sentiment perform better than those with a neutral or negative sentiment?”
How does a modular approach allow us to answer these question so easily? It all comes down to being able to leverage Dimensions and Measures already developed for each Subject on their own (i.e. Images and Ad Impressions) and then being able to combine those Subjects into a unified multi-Subject Graph that can be easily queried.
Recapping my paper, if you take a modular approach to analytics, you can decompose your analyses into separate “Subjects” (tables), and then further decompose those Subjects into Subsets. Each of these sub-components can be developed independently of the others. Once these components (stored as portable data files) are “docked in” to the main repository, they can be “lobbed” and “linked” together by users to form graphs that allow for cross-subject analyses.
Let’s first break this down into the two subjects at hand: Images and Ad Impressions.
Let’s now tackle the first Subject “Images”. We may have a team responsible for developing reports to analyze Image statistics. For example, this team may have developed a set of Dimensions and Measures that allows them to determine how much Images cost to produce, how old they are, and what type of sentiment they are intended to produce. Since images would presumably be developed by different teams, they would have their own reports (represented as tables) segregated by team. Since each team’s reports would conform to a standard published schema, they could be combined to form a single cross-department report. For example, “Team A” and “Team B” could combine their image reports into a single “Image” Subject table.
Moving on to the second subject “Ad Impressions”. Again, there may be multiple teams running multiple advertising campaigns across multiple advertising platforms over several months. The teams responsible for managing these ad campaigns might even be different based on the Ad Campaign or the Digital Advertising Platform the ads are being served up on. Like the Image team, these advertising teams would also have a set of Dimensions and Measures that would allow them to determine how often an ad was clicked on, how many conversions (e.g. goal actions) there were, what the dollar amount of the conversions is, and how these metrics break out by gender and other demographic & psychographic variables (which may be specific to the ad platform). Again, since each team’s report would conform to a published schema, they could also be combined to form a single report. Again, this combined “report” would constitute the “Ad Impression” subject.
I have just described two different Subjects, each with their own set of Dimensions and Measures, and each composed of their own sub-sets of data. Where the modular approach becomes relevant is that it is now possible for users to locate these sub-sets and “lob” these sub-sets into larger subjects and then “link” these subjects to form graphs that allow for cross-Subject analyses. Namely, we can now ask and answer the questions we raised near the beginning of this post:
- “What images give me the best click-thru and conversion rates?”
- “Do older images have the same click-thru rate as newer images?”
- “Including the cost of image production, what is the overall cost of my Ad Campaigns?”
- “Is there any relationship between the cost of an image and its click-thru and conversion rate by gender?”
- “Do images with a positive sentiment perform better than those with a neutral or negative sentiment?”
However, there is one piece missing from the picture: In order to make this possible, we would need to define a simple “bridge” table for connecting the image profiles to the ad impressions. This bridge table would be developed and maintained by the team that has access to the information required to link the two subjects together.
The following diagram shows how sub-sets sharing the same schema (as depicted with their own colour) can be “lobbed” together to form larger subjects, and how subjects sharing a linking column can be SEMI-JOIN linked together to form a graph for cross-subject analytics.
Astute readers might point out that there is nothing preventing a determined analyst with access to the underlying data from answering the same questions. While it is true that the end result can be achieved through current approaches, these approaches tend to be prohibitively expensive. Here is what is different about the modular approach:
- Users can integrate data through user-friendly graphical interfaces allowing them to vertically “lob” Sub-Sets into Subjects and then horizontally link those customized Subjects without fear of introducing duplicates through the common “Fan Trap” problem that bogs down most data integration efforts
- Users can independently develop new Subjects and Subject Sub-Sets, and then “dock in” those Subjects and Sub-Sets in a self-serve manner, without relying on IT assistance, while still conforming to enterprise data governance rules thus protecting Metadata Integrity and Data Integrity, thus allowing data to be safely located and integrated by other users
- Users can “time travel” by choosing an older “AS-OF” date and time, and performing analyses across data that was current as of that date
- Data files are portable and can be potentially moved to wherever they are needed for either analysis or downstream processing
- An example file name, containing from the first Ad Impression Subject Sub-Set (as shown in the above diagram) might be: “AdImpression_V1_CAMDAPMON_SF-G-2017-04_AS-OF 2017-08-26 153100.csv”
On top of all of this, other Subjects such as “Web Session” could be “docked in” in to the larger repository allowing Data Analysts to include any Dimensions and Measures developed for the “Web Session” Subject (e.g. ‘Session Duration’) to be incorporated into analyses relating to Images and Ad Impressions. For example, we could ask and answer the question “What images have not been used for the past 7 days of Web Sessions?”
This example provides a small glimpse into how a modular approach to data management opens up new analytical opportunities that would normally not survive cost/benefit analysis using current approaches.
No comments:
Post a Comment