Sunday, October 23, 2022

Happy Paths: Why I am Looking Forward to Azure Synapse Gen3

Before staring I should mention that my main contract ends in December this year (2022) and I am looking for contract work for the 2023 year. I am a lifelong learner and data & enterprise architect with 28 years experience. I also love tinkering with new technologies.

I can be contacted at:  neilhepburnjob@gmail.com or on LinkedIn at: www.linkedin.com/in/costie

On with the article… 

If you have been paying attention in the data and analytics space you may have noticed a shift towards a concept often referred to as “Data Lakehouse”. The technology essentially allows a structured database to be scaled with no limits by perfectly isolating compute from storage. This solves old problems and opens up a host new possibilities including AI and ML.
If you think it’s a flash in the pan or something akin to the Data Lakes which have proliferated since the release of Hadoop going back to the late 2000s, then you should take another look at what is happening now.

There are three technology trends that are converging for the purpose of solving some big problems that have plagued data management since, well, the invention of the DBMS going all the way back to Charles Bachman’s Information Data Store from the early 1960s.

What are these technology trends and what are these problems you ask?

The trends are:
  1. Data Lakehouse (which we have already mentioned)
  2. Analytic Workspace
  3. Infrastructure as Code

The challenges are mostly around bureaucracy and happy paths’.
The hard technical problems themselves have been solved since at least the early 2010s - anyone who genuinely needs“Big Data” capabilities for whatever purpose can obtain these technologies to solve their problem. Companies like Google and Meta simply couldn’t exist if they hadn’t been solved.
So what are the challenge I speak of then?
It all comes down to a lack of happy paths (or too many dilemmas or choices).

We need:
  1. A better ‘happy path’ for deploying Analytic Workspaces
    1. Currently the act of deploying a new Analytic Workspace is at best a dreaded “Change Request” or is at worst a big one-off project
  2. A better ‘happy path’ for onboarding users into those Analytic Workspaces
    1. Currently it can be difficult and time consuming to onboard a new user into a “big data” environment. 
      1. Often fiddly desktop software configurations are required or if the user has special needs resulting in a long uphill trek. Those that can get to the top of the hill are often praised and seen as brave bureaucracy warriors - and a warning to others to avoid the same journey
  3. A better ‘happy path’ for securely obtaining, collaborating, and sharing data sets
    1. This might be the hardest challenge of all. Sure you might get some copy of the data. But good luck on getting it refreshed or better yet ensuring you are working with an authoritative source respected by the business data owners (who are often disinclined to share their most precious assets and knowledge)

So why am I looking forward to Azure Synapse Gen3 then?
Well, before I answer that question I should point out that there are plenty of other alternatives out there (including Azure Synapse Gen2).
I will list these alternatives and then explain what I think the alternatives are.
Oh and I should point out - I have no idea what Azure Synapse Gen3 will actually entail. Everything I am writing here is based on pure speculation and conjecture. But my hypothesizing is informed by these factors:
  1. Microsoft has a history of getting things right on their 3rd attempt and learn from their mistakes (and the shortcomings of others)
  2. Microsoft understands corporate governance (i.e. bureaucracy) in ways that other big players like Google and Amazon seem to lack (or see as beneath them)
  3. All of the big players are aggressively investing in Data Lakehouse technologies:
    1. Meta (Facebook) is the pioneer here and has been investing in Apache Hive, Apache Presto, and related open source technologies since 2010 and this stack continues to improve
    2. Amazon is investing in Apache Hudi and (and possibly Apache Iceberg)
    3. Google is investing in Google BigLake
    4. Snowflake is investing in, well Snowflake (and Sigma) and probably other stuff I’ve yet to be made aware of
      1. I won’t mention Oracle here except to say that I tend to think of Snowflake as Oracle 2.0 (I have a lot of respect of Oracle - and many feelings of cognitive dissonance which extend to Snowflake)
    5. Microsoft (along with Databricks) has for quite some time been investing in Databricks’ Delta Lake

I think it’s entirely possible that those other vendors will have something that eclipses Azure Synapse rendering it obsolete. The possibility of disruption is always around the corner.
But let me explain why Synapse Gen2 is quite impressive but also slightly lacking:

  1. The Synapse Workspace is a browser based environment. Once configured (along with requisite AAD security groups), users may be onboarded to singular Synapse Workspace by simply adding them to a single AAD group.
  2. In one simple request, a Synapse Gen2 Workspace gives the user:
    1. Access to an MPP RDBMS (i.e. a super powerful SQL database optimized for analytical workloads)
    2. Access to a Data Lake and Data Lakehouse (Delta Lake)
    3. Access to Azure Data Factory (EL/TL) including Data Flow (ETL) and Wrangling Dataflows (business Data Prep based on Power BI’s Power Query ‘M’ language)
      1. These are truly best-of-breed tools which come with a “deep bench”
    4. Access to Spark Python Notebooks along with horizontally scaleable clusters which can be scaled to virtually any size (assuming you have the $$$)
    5. Access to Power BI
      1. Another best-of-breed BI tool (the only tool that is better is Qlik Sense - as I have written about in the past. Both Power BI and Qlik share the more flexible “linked models” [as opposed to cube based models like what Tableau and Microstrategy rely on])
  3. Azure tools like Azure Data Factory, Azure SQL Database, and Azure Databricks all have a committed (some may zealot - which is a positive here) developer base - that’s a good thing
    1. Unless you have a bit of a religion going with your technology, you will find yourself bowled over by people who are religious about their technology (in the tech industry has always been the case, but it’s more explicit these days.)

So what’s wrong with Azure Synapse Gen2 then?
  1. Deploying a new Synapse Workspaces is a bit complicated and requires a lot of decisions around whether to use the SQL database or the Delta Lake
    1. Dilemmas are the enemy of the Happy Path
  2. Onboarding new users can be made easy if you have set up the AAD groups correctly, but I think there is room for improvement here. Again there are more choices than I think are necessary
  3. It’s difficult to share data with external parties
  4. It’s not obvious as to whether we should be using the “severless” Data Lakehouse SQL database”  (i.e. Delta Lake) or the more mature MPP Dedicated SQL Pool
    1. This is in my view the biggest challenge of all for Microsoft to solve and the one I have highest hopes for

So what am I expected for Gen3 then?
  1. Quickstart templates (maybe as Azure CLI scripts or Hashicorp Terraform scripts) for common Azure Synapse Workspace patterns
    1. One thing that would be great is to have as inputs the Data Lake folders and Delta Lake tables that users should have access, along with the appropriate permissions
    2. Another thing would be some way of better managing all the AAD groups that need to be created to accomodate the various roles within the Workspace (e.g. Data Engineer, Data Scientist, etc).
  2. A simpler onboarding experience for new users
    1. If we could do away with the requirement that Power BI Desktop be required (and any other lingering desktop software requirements) that would be great
      1. Hey I like Windows - but many Data Scientists work on Macs these days
  3. A better solution for sharing data (like Databricks’ Delta Share)
  4. A single unified DBMS based on Delta Lake - no secondary copies of data in MPP Dedicated SQL Pools (“singleversion of the truth”)
    1. This is the biggest challenge of all

On that last point, I have a feeling MS is already moving in this direction.
I believe this because they have already built out something they are calling HyperSpace Indexes”.

Backing up a bit, in case you forgot what a Data Lakehouse is, it is basically the pure separation of compute from storage. At its core, everything must be managed through documents and trusted actors.
It’s a great idea, but comes with some trade-offs.

Sure it’s possible for a single vendor like Databricks or Microsoft to ensure data consistency by coordinating within themselves. But I am a wee bit skeptical if we have achieved this goal when it comes to multiple vendors writing to the same table at the same time at high frequency. Yes it’s possible to take advantage of HDFS file locking and whatnot, but I have yet to see a good demonstration that is as “bet your business” as a traditional SQL RDBMS (like Azure SQL Database or Oracle or PostgreSQL) system can provide.
In a sense this has always been the dirty little secret of NoSQL databases: NoSQL DBMSs lack managed integrity controls and rely on trusted client applications to manage integrity for them. This gives better performance, but you can’t manage data as a separate concern. Similar challenges follow the Data Lakehouse.

To repeat: One of the primary goals of data management is to be able to manage data as a separate concern. So the Delta Lake needs to up its game and Microsoft is doing this.
Those HyperSpace tables however kind of muddy the waters a bit.
Yes, they are indexes, and yes they will improve query performance.
But there is no guarantee that other vendors will maintain these indexes because they are not part of the core Delta Lake protocol.

Nevertheless they do point us in the right direction (assuming there is only one choice).
Competition is good, but dilemmas… not so much.

But what about all those other technologies I just mentioned?
Well the honest answer is I have done some hands-on evaluation of those tools, and in the case of the open source Apache stack I have very much lived in that world for quite some time.
I’ve also dabbled a bit with Snowflake and Sigma and am very impressed with their data sharing capabilities.
Quite frankly I could see Snowflake winning this game (if it is a winner-take-all game) based solely off their approach to data sharing.
And for the record, I would be more than happy to work with Snowflake and Sigma.

I suspect though all of the five vendors I mentioned above will continue to push their own vision and tech stack and again, and as a realist I would be happy to work with any and all of these technology stacks. Would even love the opportunity to make them work together (with the appropriate ‘happy paths’).

As a reminder, I am currently seeking work for the 2023 year starting in January.
My contract ends in December, and would love to hear from you if you are curious about this stuff or want to hire me on contract.
I can be contacted at:  neilhepburnjob@gmail.com or on LinkedIn at: www.linkedin.com/in/costie

Sunday, July 18, 2021

Cletus and Koriolis

Cletus and Koriolis (by Neil Hepburn)

Preface

This is a story about truth, beauty, and reality.

Cletus and Koriolis

A long long time ago, around after 1000 BCE, there was a Greek youth who went by the name of "Cletus". Cletus like his father was hard working and ambitious and a fisherman. But Cletus was also a fiercely independent thinker who was always searching for his own light - a curious autodidact who enjoyed tinkering and figuring things out on his own as much as possible, even if that meant making some mistakes.

One day while practicing some sailing maneuvers on his father's boat he sailed a bit too far from shore and was blown far from home all the way to the north African coast. After spotting land and a natural beach he moored his boat and swam ashore to find someone who could help him find his way back home. With a few fish in hand [for payment in kind] he eventually found an Egyptian merchant who luckily happened to speak a little Greek and was willing to help him. Cletus asked the man if he could tell him where the source of the gods were is so he could get his bearings. Cletus lived in a part of the Mediterranean where the winds constantly blew from the west in an eastward direction. So naturally the source of these gods was in the west. Cletus assumed the merchant knew his meaning. The Egyptian man looked confused and pointed in a certain direction and said "This is where the gods come from. This is where the sun rises. It is on the other side of the Nile." In this moment Cletus realized that this was indeed the direction he wanted to go and realized that the sun generally did rise in the opposite direction of the wind. But since he was mostly focussed on the winds while sailing he had not paid as much attention to this fact about the sun rising in the east. Cletus even vaguely remembered his own father mentioning something about this. Cletus thanked the man and returned to his boat.

After reflecting on the advice he had just received, Cletus could feel the hairs standing up on his back as it sunk in that the sun was a much better way to navigate than the using the winds. Maybe he should have listened to his father more. But by figuring this out on his own he felt a certain connection to this knowledge that resonated with him. He felt this knowledge had brought him closer to the gods.

While sailing on his way back home Cletus noticed a person treading water in the middle of the sea.  As he sailed closer he could see it was a woman who appeared both frightened and exhausted. Cletus pulled the woman to safety and gave her some time to catch her breath and relax. After some time the woman began to speak in a raspy voice that Cletus could barely make out. She was Phoenician but was surprisingly fluent in Greek - much better than the Egyptian merchant. She asked Cletus where he was going and Cletus told her the name of his town. The woman recognized it immediately and said, "don't worry I will get you home. But first, I want to repay you. Would you be willing to stop off on a small island where I know there are some supplies and we can rest for the night; tomorrow you will be home."

Cletus considered her offer and thought of his worried father. As it started to rain Cletus - in the moment - made a snap decision and made up his mind that he was going to go to take the woman to the island after all and receive his gift. Cletus knew in his heart-of-hearts that the rain was just an excuse for his father; He could not resist the woman's offer.

Upon arriving at the island the woman quickly ran off to a dense thorny area not far from shore. In that moment Cletus thought he had been duped. Fortunately, his luck changed again and the rain had begun to subside with the sun coming back out. But it was late in the day and Cletus began making plans for camping on the island by himself. Then, out of the corner of his eye he noticed something: The woman was returning, and in her arms was a small basket. She brought Cletus the basket and opened it to reveal an assortment of biscuits, and dried dates and figs. At once Cletus felt his appetite sated and immediately his mood changed from feeling bitter to feeling thankful and even joyful. "Thank you" he said "I thought you had abandoned me." "Oh," said the woman, "I would never abandon someone who just saved my life. This food is not the gift I was referring to earlier. This is just from a cache we Phoenicians keep in times like these in case of emergency. Your gift is coming later tonight, but it's not what you think it is - as you can see, I am too old for that  - so keep your clothes on. What I am about to show you is much much better than any of your childish fantasies."

Later, after the sun went down and twilight had passed into dusk the woman stood up from the camp fire and said. "Are you ready for your gift?"
Cletus nodded yes. The woman then said, "What I am about to show you is perhaps the greatest piece of wisdom the Phoenicians have ever received. You must swear an oath to the gods that if you ever reveal this secret then you will punished by Koriolis. Cletus had never heard of Koriolis. It sounded Greek but maybe it was Phoenician or perhaps Etruscan? In this moment it did not matter, Cletus had always wanted to partake in such a ceremony and quickly and solemnly swore the oath of secrecy to the woman and in witness to Koriolis.
 
She then took Cletus by the hand and walked him away from the fire towards the sandy shore where the night sky could be seen in all its majestic glory. 
The woman looked up and then over to Cletus and said "Please stand in this direction." Cletus stood as she said. She then told him to "turn a little to the left." He turned. 
"Now just a bit to the right." 
He turned again. 
"Perfect. Now look directly up in front of you. Do you see that cluster of stars over there?"
Cletus took a moment and pointed to the Big Dipper (Ursa Major).
"Do you mean that?" 
"Yes, exactly. That is Big Bear."
"Big Bear?"
"Yes, that is just a way of remembering it. You can see its body is there and its head is there. But call it what you want."
The woman continued: "Now, follow a line in this direction" - she drew a line in the sky from the Big Dipper (Ursa Major) to the Little Dipper (Ursa Minor) - "and this will take you to Little Bear". 
Cletus was feeling both excited and impatient. He just wanted the woman to get to the point.
"Now you see Little Bear is like Big Bear in reverse. Now that we have found Little Bear we are almost done. You just need to look for these two stars." The woman's finger moved back and forth so Cletus could follow.
"Now the last thing you need to do is focus on those two stars and find the point half way between them. That point in the sky will always point you north. If you move towards this point you will be going north and it will get colder and you will begin to feel cold and you may even find snow. If you move away from that point you will find heat and sand. Oh, and the higher in the sky Little Bear is the farther north you are. And when you go south, Little Bear will be lower in the sky."

The woman then explained that because of this Little Bear was more reliable than the sun and the moon or even the wind for navigation. It was what allowed the Phoenicians to navigate and trade at great distances.

Cletus had heard about the Phoenician's mastery of the seas, but until then did not realize that it was connected to this. He once again felt the hairs raise on his back, but this time it was more intense. He could barely catch his breath. His mind was raging with possibilities.  He not only felt closer to the gods but felt as though he was communing with them in this very moment as he was staring at Little Bear. He felt like a demigod; He felt like Hercules.

The next day Cletus reaffirmed his oath to the woman (and Koriolis) and returned home to console his father.

Over the years Cletus used his new-found knowledge to fish and trade at greater distances. He became legendary among his Greek companions but refused to give up his secret. He knew that this secret was both a blessing and a curse but refused to betray his oath. He wasn't actually afraid of the Phoenicians but was afraid of the gods, which kept him honest. It was also incredibly difficult to live with such a profound secret. This secret had become a curse but Cletus was still too afraid to break his oath for fear of retribution from Koriolis. 

In his frustration he did something he had not done in a long time - he decided he was going to sail as far away as possible. Instead of sailing past the Pillars of Hercules and then north (which was where the Phoenicians often headed), he considered sailing west or east. The idea of sailing west was in many ways the most intriguing to him, but he had never heard of a single sailor - even a Phoenician - who had ever returned; rumour had it that there was some deadly current or capricious god that prevented return. Instead, he turned his attention to the east.
He had heard many interesting stories about the far east and made up his mind to head in that direction. He decided to follow coast lines as opposed to sailing as directly as possible. This was because although he knew the direction he was moving in and what his north-south position was using Little Bear, he could only determine that he was heading east and could not know by way of Little Bear how far he had actually gone. Eventually after several months (and many interesting stops) he found himself in a very distant land with very unusual customs and languages unlike anything he had heard or seen before. He wondered if this is where the exotic shiny and smooth piece of cloth he had seen earlier on this journey had come from? The people in this land also looked different from anyone he had ever encountered.

His arrival upon shore was met with bemusement. Indeed this was the the first time anyone had seen someone who looked like Cletus, and everyone was staring at him. Some were even touching his arms which in that moment he realized looked quite sweaty and hairy. Cletus being who he was just went with the flow and found himself being led towards what appeared to be some kind of temple where he was offered some food and drink. The food was also a bit unusual - more steamed and boiled than he was used to - but tasty and satisfying all the same.

Over the course of the next few days Cletus began to learn the local language as they learned his Greek. Eventually an old man with a long white hair appeared holding a box about the size of a melon. The old man asked Cletus if he knew what was inside the box. Cletus responded by shaking his head. The old man said "But how did you get here then?". Cletus was stunned. He had no idea what the old man was talking about. The old man then asked plainly "How did you get here then? What is your secret?"
Cletus froze. He was confused and frightened and didn't know what to do. 
Were the gods playing a trick on Cletus?

Cletus took a moment to collect his thoughts. He then responded to the old man: "Lord, I do indeed have a secret I will admit. However, I had sworn an oath to the gods that I would not reveal this secret. But if you already know my secret, then tell me what you think it is and I will confirm."
The old man considered this and took Cletus to be an honest man. So the old man said to Cletus "All right then, I will tell you this: 'I have a tool that can point us to the true source of all energy.'" Cletus didn't entirely understand what he meant by this but asked the man if he could use this tool to point in the direction of this energy source. The old man then opened the box to look inside and then pointed in a northerly direction.

Cletus could feel his chest tighten. He never imagined this would ever happen. The old man clearly knew his secret or at least how to extract its bounty which was good enough for him. Cletus felt for the first time that he had been released from his oath. Oh how long he had waited for this.
Cletus then said to the man "I think you know my secret. But I will make it clear to you how I came to this place." They both waited until the sun had gone down and the sky had cleared. Cletus then revealed Little Bear and how he used it to navigate such great distances. Cletus made it clear that although he could point himself east or west, he could never be sure how far east or west he had actually traveled. But Little Bear's guidance was enough to allow him to confidently make this long journey east and then back home again.

The old man looked up and then down at Cletus and exclaimed "Thank you! Thank you!" Then the old man began to weep. Cletus was confused. After a few minutes the old man regained his composure and explained his emotional state: "Cletus, we have always wondered if this day would come. You have solved the mystery of the lodestone - a most powerful and mysterious tool. You have guided us to its true source. We always knew these lodestones were divine. But we never realized until now that it was pointing to this Little Bear in the sky. It all makes perfect sense now. Thank you Cletus."
The old man then opened the box to reveal a small jar filled with water and a tiny needle floating on top. He then demonstrated that no matter what direction he turned the box the needle would always point in the same direction: back to the general direction of Little Bear.

Cletus could hardly believe his eyes. The hairs stood up on his back and shivers ran down his spine. In that moment Cletus knew he had just discovered something with immense power, but more importantly he had discovered a tool he could hold in his hands that would guide him to the source of the gods' power even during the day when Little Bear could not be seen. His hands were now shaking.

After learning how to locate more lodestones Cletus returned home with his newfound compass and the knowledge behind how to make more.

Upon returning home Cletus organized an assembly and proceeded to demonstrate and explain how the compass and Little Bear worked and also how he had come to discover it through an exchange of secrets in a distant land. The crowd was skeptical at first but was quickly convinced after a few demonstrations coupled with observations of Little Bear. This day would never be forgotten. It was the day that changed history. From hence forward dates were recorded as either "Before Cletus" or "After Cletus".

A new Age of Discovery commenced with many expeditions heading north - the most obvious direction given that this is where the source of the gods was originating from. While numerous sailors became lost or froze in the extreme cold, many others returned with exquisite treasures. Amber jewels were most highly prized but there were many other exotic items that also commanded peoples attention. A sled with skis was particularly fascinating to those who lived in the hills. Most people just loved the smoked fish and delicious honey that quickly became new culinary staples. The economy was booming with trade.

One day Cletus heard a story from another town that concerned him. A young sailor, after drinking too much wine during a local festival blurted out that he thought the gods had erred by giving away their secrets and that he was happy for this mistake. What the sailor did not realize - having been at sea for so long - was that not only had the economy changed but people's core beliefs and daily rituals had too; There were new stories and ceremonies and festivals that paid tribute to Little Bear. Priests and philosophers seemed to be in agreement for once with priests extolling the poetry and beauty of the compass and Little Bear and philosophers remarking on how elegant and simple the compass and Little Bear were. Harmony had been achieved and anyone disrupting this harmony was an unwelcome muckraker.  The young sailor who had too much to drink learned this the hard way and was drowned at sea for his blasphemy against the gods.

As time passed Cletus began to suspect the gods were playing tricks on him. He wondered if the god Koriolis was punishing him for revealing the secret of Little Bear. It wasn't Cletus' fault - he had good reason, and still does - for believing the old man already knew his secret. So Cletus began to wonder if Koriolis was playing a more devious trick on him. He constantly thought back to that fateful day when he swore that oath to the Phoenician woman and Koriolis. He thought back to the earlier part of the day when he had make a connection between the winds blowing from the west and the sun rising from the east. He wondered if the gods had deliberately distracted him from these other phenomena. He thought back to the story of Jason and the Argonauts and how Jason's shipmates were nearly seduced and killed by the Sirens and their entrancing song. Perhaps, he thought that Little Bear and the compass - with their elegant symmetry - was a kind of elaborate siren song.

Over the next year Cletus returned to his trusty boat and set off on a new journey. He began travelling around with a special log book and started taking notes about the wind and sun. In particular Cletus noted down wind direction in various locations. He also began measuring the shadow cast by the mast of his boat and noticed its length changed even during the sun's highest point depending on how far north or south he happened to be. After several months surveying a new picture formed in Cletus's head and he returned once again home to relay his findings.

Being famous, Cletus was able to gather a much larger assembly than ever before. Most people were just glad to see him back in town after being away for long. Cletus began.

"Good people who have joined me on this fine day. I have once again been on an incredible journey. But this journey was different from my first journey - which was indeed the will of the gods. However, this journey came from the will of myself. I do not know if the gods will even approve of what I am about to tell you, but as you know the gods can be deceptive and I believe I have uncovered their greatest deception."
Cletus took a deep breath and continued.
"You see I have been measuring the height of shadows and direction of wind from the farthest points north I could sail - even hitting ice - to the hottest parts of Africa I could tolerate. I have come to believe that we are not sitting on a flat plane as it would appear but rather the world is more shaped like an egg or melon perhaps, and that this egg or melon is rotating in a single direction. That is why the sun rises and falls and why the wind blows as it does. With this new understanding, we may be able to find a way of going east and west with the same confidence we have traveling north and south. We may even be able to venture into the great western mystery."
A man in the crowd shouted back - "Well show us the proof then!"
Cletus then held up a large round melon covered in knife-marked dashes, all in various directions.
He then held up an orange and began to rotate the melon.
He explained that this is why the farther north you go the longer the shadow and the farther south you go the shorter the shadow. He then explained that this would also cause the wind to blow in various directions, but that there was a pattern. Although he admitted that he didn't understand exactly why the pattern was the way it was - only that it probably had something to do with the rotation of the world - or melon in his demonstration.

The congregants were becoming restless. They could not understand what Cletus was saying and could not make these connections. One brave girl stepped forward and said in the most respectful voice: "Cletus, you are indeed a wise man. And we have benefited much from your wisdom. We want to believe you. But the gods have a purpose for us. We can see that purpose in Little Bear and the compass and behold its divine elegance and symmetry. But we are struggling to see the purpose or even beauty to what you are showing us here. Perhaps you are correct, but to our eyes this just looks like clutter and noise - not the divine elegance you once revealed to us.  The gods are beautiful, your pock-marked melon is not."

Cletus felt hurt. He had expected his discovery to be embraced with the same curiousity he felt years ago after realizing there was a connection between the wind and the sun. But as he looked at the melon in his hand - now beginning to rot under the heat of the sun - he realized in that moment Koriolis had got the better of him.

Fin.

Sunday, May 30, 2021

Analytic Efficiency Part 3: Why Qlik's Associative Index is the Biggest Little Analytics Invention Over the Past 30 Years

The headline of this post is a quote borrowed from the Canadian author Witold Rybczynski who when asked by the New York Times what the greatest invention over the past millennium was, responded that it was the screwdriver (I would have chosen computer - by I'm not very good with home renovations), and went on to say that the Robertson screwdriver in particular was the "biggest little invention" over the past 100 years.



If you live outside Canada, or do not often handle with screws you may never have heard of the Robertson screwdriver or the story behind it. I will get into the details of that story near the bottom of this post. But for now all you need to know is that the Robertson screwdriver is widely used in Canada and only just recently is picking up traction in the US, even after being on the market for over 100 years. This is very odd since the Robertson screwdriver does the job of screwing much better than other screwdriver out there.

Qlik I would say does a better job at analytics than any other tool out there, but like the Robertson has also failed to take over the world (or at least its underlying innovation), which is also very odd.

I believe the reason for this can be summed up as a lack of connoisseurship. Most people cannot concisely explain what analytics is or how it differs from related concepts like statistics and 'data science'.

Analytics is in essence a game of '20 questions', an endless stream of why why why, which can be boiled down to yes/no questions. While most analytic tools allow the analyst to interactively see the 'yes' answer to any given question, only Qlik's Associative Index allows the analyst to get to the 'no' answer. This allows the analyst to keep asking questions without having to pivot to Data Wrangling and Data Prep work, and may thus sustain a "flow" state of mind.

It is for this reason I see Qlik's Associative Index like the Robertson screwdriver: They were both purpose built for their task, more so than the competitor's offering. Why is it then that both of these tools have not done better?

At this point you might be a bit confused. I hope by the end of this article this will make a bit more sense to you, and in the process I also hope to share with you how I have come to understand the meaning and essence of what "analytics" actually is.

This is the third and final post on this topic. In this post I wish to discuss three sub-topics:

  1. What history can teach us about the nature of tools
  2. The psychology of analytics – as I have seen it and experienced it
  3. How #1 and #2 can be understood through lens of Power BI vs. Qlik


What History Can Teach Us About the Nature of Tools

I have published this as a standalone article here.


Later in this post I will elaborate on my analogy to screwdrivers, but if you read this other article I published a few weeks back I hope to also convince you of how Qlik is more like “Wordstar” and the “mongolian horse” whereas Power BI is more like the “MS Word” and the “European horse”, respectively.


The Psychology of Analytics – In My Travels

Harkening back to the first article in this series, I explained how I see Analytics as a game of twenty questions -or even a debate -  that continually feeds back on itself: You ask a question which provides an answer that leads to another question and another answer and so on until you see no point in asking more questions. Is there a pattern to these questions that can help us choose a tool that fits these patterns?

The overarching pattern in analytics is that when asking and answering questions, we are constantly making comparisons to find what we’re looking for. The syllogism itself – which underpins analytical thinking - is the process of comparing the Major Premise (e.g. ‘All persons are mortal.’) to the Minor Premise (e.g. ‘Socrates is a person’) and comparing the two premises based on their middle term (e.g. ‘Person’) to see if they connect to trigger the conclusion.  All questions can be boiled down to ‘Yes’ or ‘No’ (this is known as the “law of excluded middle” in the so-called “laws of thought”).  All computer code basically works like this.

However, unlike computer code – which tends to follow the logical trajectory of “true” statements, while usually ignoring false or negative conditions  – analytics is equally concerned with the negative outcomes as positive ones.

The game of “20 Questions” is played out by making a series of comparisons where both the positive and negative answers guide us. Business Analytics is no different. 

Is there a pattern to the types of comparisons people make?

I have observed and participated in hundreds of analytical exercises – everything from quick ad hoc reporting to working alongside statisticians, to building predictive analytics systems, to building enterprise data warehouses – and having observed the types of questions business decision makers ask and answer (how they play the game of Analytics), I believe there are two main patterns which stand out which and are important to understand.

These two main analytical patterns roughly align with management level:

First, the “Drill-down and Compare” or simply “Drill” pattern: Managers and above are generally interested in making comparisons between aggregates.  Typically, a manager will start with the “big rocks” (e.g. Total Revenue) and then begin drilling down by whatever dimension (e.g. time, location, product category, etc.). When drilling down, the user is making comparisons between groupings.  These comparisons are typically visualized in bar charts, line charts for trending, or pie or treemap charts for share-of-total.  Most analytics at this level is concerned with either finding outliers (e.g. a spike in sales for a given date) and then drilling down to the details to learn more, but this may require a hand-off to other analysts which we will discuss in a moment. 

Managers are also interested in patterns, especially trends over time. Most of the time the manager analyst just wants to know that everything is running as usual – like a heartbeat monitor. If they look at the same report on a regular cadence (e.g. monthly) and they will grow accustomed to how charts should appear so if there is shift to the pattern, they will notice this and begin asking follow-up questions. These follow-up questions often get handed over to other analysts.

Managers and directors are for this reason fond of interactive dashboards and self-serve reports, and this is where most money is spent with respect to Business Analytics and Data Warehousing. Tools like Power BI, Qlik, and Tableau, not to mention underlying data warehousing and data prep tools from vendors like Oracle, Amazon, Google, Informatica, Databricks, Cloudera, TeraData, Microsoft and so forth. This BI+DW (Business Intelligence + Data Warehouse) spending accounts for a significant percentage of overall IT spend.

While managers and directors appreciate interactive dashboards, executives and board members on the other hand often prefer more digested insights. People working at this level expect all questions to already be answered. If there is an outlier or pattern anomaly the report will usually provide an explanation upfront.  Tools used to provide these types of reports are often just presentation tools like MS PowerPoint or graphic design tools like Adobe Photoshop.  The costs associated with these reports are mostly on the human resources who develop these custom reports and infographics. Analytics at this level ultimately plays out in an executive board room with a presentation followed by a dialectic discussion and possibly debate.

Getting back to managers and directors, this is where most money is spent on analytical tools. But as mentioned above managers will routinely find outliers and pattern anomalies through a “Drill-down” analysis. But uncovering the deeper reason for the anomaly or change in pattern requires shifting gears. Managers will often either hand-off the follow-up questions to other analysts or they themselves will switch gears and begin using… Excel.

Recalling our game of ‘twenty questions’, most tools do well when the answer to a question is ‘yes’, since we can keep drilling down into that selection. But when the answer is ‘no’ most of these tools run into a problem because they are not designed to examine the excluded values or ‘negative space’ as it were. 
What is negative space?

The poet John Keats coined a similar term “Negative Capability” to describe negative space as a source of mystique and wonder that poets and artists could explore and contemplate, often romantically. 
In the world of science, filling negative space drives all endeavours. It is the reason you keep hearing about “dark matter” and “dark energy” and why we are reminded that Einstein’s theory of relativity must ultimately be false (but like Newton's laws, is still useful).

Fortunately, however, the material nature of most organizations is recorded in structured datasets. These datasets are are finite and therefore it is physically possible to probe the ‘negative space’ when asking questions. However, most analytical tools – and data warehouses - are not purpose built for designed for this task. This means that humans are doing most of this work.

This brings me to the second pattern, the “Contrast and Compare” or simply “Contrast” pattern: The Contrast pattern normally occurs following the Drill pattern and is concerned with exploring and contrasting negative space. The Contrast pattern tends to follow one or both processes:
  1. Isolate relevant sub-population (e.g. Toronto and Ottawa) within parent grouping (e.g. Ontario)
  2. Contrast sub-population to their complement: e.g. Average Home Price in Ottawa + Toronto vs. rest of Ontario
    1. Allows us to see the difference between housing prices in urban metropolitan regions versus rest of Ontario
OR:
  1. Take two populations (e.g. Rentals during winter versus rentals during summer)
  2. Determine what is in the first population that is not in the second and vice versa
    1. Allows us to appropriately target summer-only customers or winter-only customers or all-season customers with specific and relevant ad campaigns

When it is easy to isolate sub-populations by simply filtering on a small number of fields then we need not go beyond the “Drill” pattern and many analysts may use dashboards in this way. Although they will only be able to see differences in the aggregations (e.g. total number of sales by Month) but this can still be revealing.  For example, a telecom analyst who suspects that outdoor weather precipitation (e.g. rain) might be causing a spike in call volumes might filter on all the known dates over past 12 months it rained or snowed. The analyst would then filter on the compliment – all days where it did NOT rain or snow - and then compare these sub-populations with respect to standard measures like Call Volume and Average Handle Time to understand the differences the change in weather pattern might generate. The answers to these questions could lead to cost saving improvements in staffing and revenue generating improvements in sales.

Another common variant of the “Contrast” pattern is when the analyst is handed two sets of data and simply told to find the deltas and then report on those deltas based on a of comparison key.  For example, an analyst might be given two lists: A list of customers from ten years ago and a list of customers from present-day today. The analyst might be asked to produce a report comparing the loyal customers who appear on both lists to newer customers who are only on the present-day list to the previous customers who have since left. The report could lead to a reduction of costly customer “churn”.

There is another word – “Parallax” – that also describes this pattern. In optics, the parallax effect is when an object that is nearer to us tends to move faster than an object that is far away. By exploiting this property, it is possible to measure the distance between distant objects without traveling those distances. The philosopher/comedian Slavoj Žižek uses the word parallax to explain how “truth” can only be teased out through the types of comparisons that reveal differences. More practically, when we are making assessments about the world, like reading the news or researching a new car or other big-ticket purchase, we are attempting to find contradictions between perspectives and distill from these contradictions what is going on. For example, we might read some good reviews of a new TV and then one or two bad reviews and by focusing on the differences we might find out that the good reviewer overlooked something that the bad reviewer complained about, and we might go on see that the bad review was simply complaining about a bad installation experience they had. Or maybe the good reviewer only cared about price and was not attuned to picture quality as the bad reviewer.  It is from these contradictions, this parallax effect, this contrast, that we can uncover useful insights.

While this yes/no approach might seem binary or simplistic, and to be sure Analytics can be wielded cynically (as it all too often is), but with enough questions you can begin to see a spectrum and other patterns emerging. With enough questions deep insights and epiphanies begin to emerge. 

The visual that comes to mind for the Contrast pattern is the Venn diagram, but the Contrast process tends to be table-centric because the analyst is mostly relying on either Excel or a possibly a Data Prep tool like SQL, Python, Power Query, or Alteryx to perform the detailed comparisons between sets.
What we see then is analysts spending most of their time in Data Prep tools (especially Excel) pulling sub-populations through either delta comparisons (often using the ‘VLOOKUP’ function in Excel) or by filtering on subsets of data and then using a delta comparison (again using VLOOKUP) to extract the compliment of the sub-population they were filtering on (e.g. the compliment of all dates there were precipitation).

Most Data Prep in the business world is done using Excel using the VLOOKUP function, but much of it is done in SQL as well using the NOT EXISTS or NOT IN clause. Data Prep can also be done in more business-friendly date prep tools like Power Query (built into Power BI) which has an “anti-join” operation as part of its “merge” step.

It should be pointed out though that Data Prep is essentially programming and requires a shift in thinking. When doing Data Prep the analyst/developer can certainly enter a flow state, but they are no longer in the Analytical flow state playing 20 questions. Instead their mind is in the world of syntax and code structure – even using the VLOOKUP may require a reorganization of columns which can be time consuming and distracting.

An analogy I use to explain the relationship between Data Prep (which is more like programming) and Analytics (which is more like a game of 20 questions) is based on this quote from Abraham Lincoln: “If you give me six hours to chop down a tree I will spend four hours sharpening my axe.”  You can think of Data Prep as axe sharpening and analytics as wood chopping.  The ratio of 2:1 between Data Prep and Analytics would not be bad, but what I see is something closer to 5:1 or even 10:1. Ideally, we want to be spending more of our time chopping and less of it sharpening if possible. The reasons are that with pure analysis (chopping) you can more easily achieve a flow state while also collaborating with others. Data Prep (sharpening) on the other hand is a more solitary activity and while you can attain a flow state your focus is directed at answering technical rather than business questions.

This is the reason Qlik stands out, because it is possible to contrast data sets interactively and this is because Qlik’s indexing technology – the Associative Engine – was built from the ground up to support the selection of excluded values. With every other Business Intelligence tool I have used, negative values are an afterthought and not easily accessed.

Using our example from above, if we want to compare days with precipitation to days where there was no precipitation in Qlik we can interactively select the complement of dates, we can even restrict our complement to be within a given subset (e.g. for a given year and region). From here we can contrast customer patterns for days when there was precipitation to days where there was not.

Furthermore, QlikView (not Qlik Sense unfortunately) allows users to save the subset of data that has been selected. This means that after selecting days with precipitation (or we could choose no precipitation), we can then reduce the dataset to what is associated to those dates. From there we could choose another dimension – say “temperature” - to partition the data by in order and inspect that contrast. This would allow us to see the contrast of above freezing or below freezing to see if a pattern emerges.  We could then go back up a level to the original dataset and segregate the original dataset using temperature to see if the same patterns hold as when contrasting within the precipitation subset.

By contrasting these patterns with respect to precipitation and temperature, as an analyst we can begin to see the relationship between these dimensions and other dimensions. Depending on what differences emerge between the subsets, we can choose to partition, contrast, and then take a subsequent subset.  We can keep playing this game of twenty questions until we have answered our questions or at least satisfied our curiousity. 

Alternatively, if we want to compare multiple Customer lists to see what customers are only in one and not in another, we can do so interactively, and again we can easily isolate subsets of customers and compare them looking for patterns.

It is of course possible to accomplish any of these tasks using other tools like Excel or Power BI (via Power Query) or SQL or Python, but one cannot do so interactively.

I should point out that it is also necessary while performing a “Contrast and compare” activity that on some occasions going back and “sharpening the axe” through Data Prep is required. But in many circumstances this is not necessary, and if the data has been well prepped you can ask and answer many questions interactively by leveraging the “excluded values” supported by Qlik’ Associative Index.
In my own experience I have leveraged this power to significant effect and there have been occasions where I was able to sustain a “flow state” for up to an hour of constant Q&A through a series of dimension selections, because I was able to easily contrast selections to one another and see what was exposed and what was hidden by doing so.  On these occasions I would often find “needles in haystacks” and the feeling was not unlike debugging a computer program, but I felt like was debugging the business. And debug I did. Some of these insights did lead to noticeable changes in how the business was run.

Taking the two patterns together – “Drill” pattern and “Contrast” pattern – there is an analogy that I find quite useful for thinking of these as part of a cycle.
Namely, there is a concept known as the OODA loop (credit goes to Alistair Croll for introducing me to this) which was developed by a US Air Force Colonel John Boyd. The OODA loop was developed by Boyd for the purpose of fighter jet combat. Boyd realized that the rules of combat had changed with heavy supersonic jets and that a different mode of thinking was required to use them effectively in combat.

OODA stands for:
  1. Observe
  2. Orient
  3. Decide
  4. Act
In the context of fighter jet combat the pilot might “observe” an opponent and then instead of engaging them in the moment would begin “orienting” their plane and mindset such that the pilot has the best available information to “decide” and “act”. The decision might be to attack or just as likely to move to another defensive position.  If the pilot moves to a new position, they would repeat the OODA loop cycle until either defeating the opponent or retreating.

The most important step in the OODA loop – and what distinguishes it from other business cycles like the “plan-do-check-act” lifecycle is the emphasis on orientation. It is the “orientation” step that brings to bear the subjective experience and talents of the pilot to maximize their strength.

Relating this back to the Analytics patterns I described above I see the “Drill” pattern as like the “Observe” step and the “Contrast” pattern similar to the “Orient” step.

It stands to reason that if we can make the Orient step more efficient – which is the bottleneck of the OODA loop – then we can make analytics more efficient.

There was another tidbit of wisdom that Alistair Croll also mentioned when discussing the OODA loop which relates back to my earlier points regarding horses and word processors: Once you cross a threshold of efficiency the game itself can begin to change.

The example Croll used was the steam engine that James Watt invented.  Watt’s new steam engine, design developed in the late 1700s, is generally regarded as one of the key factors that catalyzed the industrial revolution during the 19th century. However, when it was first introduced some people speculated that it would cause a depression for coal miners because only a quarter of the coal was required to produce the same amount of “horse power” (a unit of measure that Watt also invented).  Instead, because the Watt’s steam engine had crossed an efficiency threshold it could be used for all sorts of businesses that otherwise would not be able to afford the cost. As a result coal mining began to boom like never before.

The rest as they say is history.

Difference Between Qlik and Power BI and Excel When Contrasting Data

In this section I will be contrasting the tools themselves to show how they differ when used to contrast data.

In this example we will continue with the Call Centre data, although I have created a new fictional dataset with more attributes to compare. 

Differences in Qlik, Power BI, and Excel Explained

The main difference between Qlik and Power BI when it comes to contrasting data is this: 

Qlik can be more efficient for the user than Power BI (or Excel) for contrasting data because Qlik allows the user to contrast data sets interactively by clicking buttons which can be performed while in a “flow” state of consciousness and focus on the business questions. 

Power BI on the other hand can do some of this under some circumstances but in other circumstances requires the user to build new charts and in some cases it may require a change to the data model itself. Because different types of solutions may be required – some taking longer than others – it’s harder to achieve a “flow” state of consciousness while playing the game of 20 questions.

Excel – through the VLOOKUP function - is more consistent in its approach to reconciliation than Power BI although it requires the Excel user to effectively perform some basic Data Prep. Because of this consistency users can get into a “flow” state and focus on business questions, but unlike with Qlik – which keeps the analyst in a Business Analytics flow state - Excel forces data analyst down to a Data Prep flow state, distracting from our game of 20 questions.

Let’s first explore these differences between tools using a very simple example and then move to a more realistic example in the next section.

For each of these demos we are comparing two lists [of calls] to the difference between the two sets. We are using two pairs of datasets for this:

  1. Related one-to-one
  2. Related many-to-many

Let’s see how the “contrast” exercise appears for each tool, starting with Qlik then Power BI and finally Excel.




In this example with Qlik need only click the “select all” icon (highlighted in yellow) as denoted by the double-checkmark icon.  Doing so instantly reveals excluded values (3 missing from CCA and 3 missing from CCB).

If the lists were not so compact and instead had thousands (or millions) of values, we could click the “select excluded” button to instantly see the excluded values. Furthermore we could simply select these excluded values to explore their distribution with respect to other metrics.

The following screen caps show that there is no difference with the relationship is many-to-many:




As you can see there is no difference in behaviour. The user doesn’t have to think and can follow the same “happy path”.

Although it is not necessary in this example, if we wanted to analyze the subset of calls that are excluded, in QlikView we can perform a “reduction” which effectively removes everything else except for our selections. The screen cap below shows how this option is selected and how the data appears following selection.


Here is how it appears.




Now let’s see how this compares with Excel.

We use the VLOOKUP function which performs a lookup against another table and then retrieves a corresponding row value based on a match with the key column value.

Here is how we have implemented the function as used in the screencaps below:
=VLOOKUP(A4,CCB!$A$2:$B$15,1, FALSE) 
This shows how the one-to-one linked tables appear when contrasted in Excel using the VLOOKUP function:



Here we go again with the many-to-many example:




Speaking for myself, I do not find the function intuitive to use and if I haven’t used it for a while will forget to hit ‘F4’ to make the second argument absolute so that it does not change from row to row when you auto-fill the cells below. But most Excel jockeys have no problem with these quirks and can wield it relatively quickly. Given the initial difficulty with VLOOKUP and how popular it is, I’ve often thought of VLOOKUP as a kind of analytical rite-of-passage.

Regardless of how proficient you are performing a VLOOKUP, it is essentially a Data Prep task and you may need to reorganize the column order in the data itself before you can begin to use the VLOOKUP function in order to place the key field in the correct column order.

With Qlik the user can interactively select excluded records with a single click (or maybe 2 at most) thereby reserving one’s capacitive memory (we can keep track of about 7 things in our head at once – known as Miller’s magic number). This allows the user to focus on allowing business questions to guide selections (including selecting excluded calls) allowing for a “flow” state of consciousness thus enabling optimal human performance.

Contrast this with Excel where the user must change gears and switch into Data Prep mode – even if only for a few seconds – this breaks the steam of consciousness pertaining to the business questions and forces the analyst to context switch to a Data Prep exercise.  If the Excel analyst is practiced and quick and can complete the VLOOKUP in under 10 seconds, then they can maintain their train of thought then they may be able to return to a flow state. But every VLOOKUP and Data Prep task potentially breaks this flow and thus undermines the desired flow state.

While the Qlik and Excel approaches are quite different in that Qlik can perform the task interactively maintaining business question-mode thinking whereas Excel requires changing gears into Data Prep-mode thinking, but what they both have in common that they do not share with Power BI is consistency. Namely, regardless of the data cardinality, whether the data is linked one-to-one or one-to-many or many-to-many, the operations the analyst performs are the same. It is this consistency in Excel that allows an experienced user to almost work subconsciously when performing the Data Prep tasks thus keeping the train of thought on the business questions.

Having seen Qlik and Excel, let’s move to our third and final example: Power BI.
The two screen caps below show two pairs of Power BI Slicers respectively.  The screencap showing the first pair show how they appear without user selections.
In the second screencap, we can see what happens when the user selects ‘(Blank)’: Power BI reveals the missing Call IDs from the other table.

In this example we can see the user need only click ‘(Blank)’ to see the missing values. In Power BI, any time there is a one-to-one or one-to-many relationship defined, the user can always click ‘(blank)’ on the “one” side to see the missing values on the other side. Just like in Qlik the user need only click a single button to reveal the deltas.






The one-to-one example is very similar to Qlik and can be performed with a single click, thus enabling the desired “flow” state of consciousness.

But what if the relationship is one-to-many or many-to-many? The differences between the relationships are shown below.




Due to the nature of Power BI’s Vertipaq indexing engine, columns that are in a table on the “many” side of any relationship will not allow users to simple select ‘(Blank)’ as we saw in the example above. Instead, the user must develop a Table Chart and configure it to show blank values. 

The example shown below shows the delta between the two lists by building a simple Table Chart and then sorting the smallest values to the top to see the deltas (highlighted). You can also see that the Slicers no longer have the option of selecting ‘(Blank)’.




As noted above this chart requires an extra step of configuration that is normally not required: I had to check the “Show Items with No Data” option.




Reflecting on this we can see that for the many-to-many relationship it is a different and more complicated process we need to follow when reconciling across relationships that have a “many” sided cardinality on one or both sides.

It should also be pointed out that it makes no difference if the link direction is Both or Single direction or which direction the link is pointing since the only way to answer must be calculated through a chart that we must build that requires a more time-consuming dense calculation - by explicitly evaluating all possible combinations - as opposed to a quick sparse calculation like when we selected ‘(Blank)’ in the earlier example.  Qlik on the other hand is only making quick sparse calculations as it leverages its Bi-directional Associative Index (BAI) to directly pinpoint deltas.  This performance hit in Power BI adds another level of friction when it comes to reconciliations. But the main difference between the tools is that Qlik provides a consistent, quick, and interactive reconciliation experience whereas Power BI requires changing gears to build tables, and then waiting additional time for those densely calculated tables to render.

There is another approach we can take with Power BI: If we look to Power BI’s built-in Data Prep tool Power Query, here we can see a more consistent “happy path” to reconciliation much like Excel.

Namely, we can use Power Query’s ANTI-JOIN feature which can be accessed through the “Merge” step with the Join Kind set to ‘Left Anti (rows only in first)’ as shown below.


After completing both a ‘left anti’ and ‘right anti’, the screencap below shows the resulting delta datasets that were correctly revealed.



As it turns out this ANTI-JOIN approach is closer to a ‘happy path’ due to its consistency in procedure – since the cardinality and link direction of relationships is irrelevant -  rather than through navigating Slicers and Tables in a pre-built Tabular Model and performing different operations depending on the cardinality of relationships.  Furthermore, this approach to reconciliation can be easily automated and the resulting data can be selected and analyzed by the user. This is like how a Qlik user can “Select Excluded” values and analyze the compliment set.

Thus, if your focus is on contrasting data and you need a surefire tool, it is better to use the Power Query within Power BI as opposed to Power BI proper. 

In summary, the name of the game is not only to meet business requirements (any Turing-complete software can do that), but rather the game of Analytics should be about putting the user in a “20 questions” flow state of consciousness for efficient business problem solving.

Both Excel and Power Query do provide a “happy path” when it comes to contrasting data, but that happy path requires context shifting from business analytical thinking to Data Prep tool thinking. Qlik on the other hand is well designed to perform interactive reconciliations at a record level and with some training users can achieve a flow state of consciousness.

In the next and final section, we will compare Qlik and Power BI using a more realistic example.

Data and App Documents

Here is the data we used. It also includes the Excel VLOOKUP examples.

Here is the QlikView application document.
Here is the Power BI application document.

Differences in Qlik and Power BI in Practice


In this example we will analyze a telecom’s (e.g. Bell, Rogers, AT&T, Sprint, etc.) Call Centre data in order to answer the question: During February 2020 which Customers placed calls to the Call Centre answered by non-authorized Customer Service Representative and did not generate any Work Orders?

We can break this question down into a series of yes/no questions:

Question: Did the customer place a call in February 2020?
Answer: Yes.

Question: Was there an authorized Call Centre Representative on this call?
Answer: No.

Question: Does the call have an associated Work Order?
Answer: No.

With Qlik and Power BI we can ask these questions for our entire population of linked customers, calls, and work orders.

With Qlik we can answer these questions quickly and interactively while maintaining a “flow” state of mind, but with Power BI we must change gears and build and configure dense pivot tables which leads to mental context switching.

Before getting into the heart of this example I should point out that I am using fictional data that I generated using a normal random number generator. This data set lacks the novelty we find in the real world. I must apologize that the “insights” revealed are not on the level we would get from a real-world data set. However, there is also a lesson here: Publicly available data sets tend to be relatively simple in structure (e.g. most open data sets are just denormalized “fat” tables), whereas real business systems produce large normalized models with multiple fact tables.  It is when we have multiple fact tables that both Power BI and Qlik really begin to shine and is also where we can see the most contrast. For privacy and confidentiality reasons most organizations understandably don’t want to reveal these data sets to the general public.  As a side note this is one aspect I love about my job: I get to see these databases across so many industries in all their glory. But I cannot share those databases for obvious reasons.  It’s for this reason Microsoft often uses the “Adventure Works” database for demonstration. But even that is a fictional dataset lacking real world novelty.

But if you can suggest any datasets that have multiple fact tables and that are useful for teaching, please let me know?
In the meantime I’m going to keep looking for or possibly developing on my own a real data set that can be used for teaching analytics. One thing I have found since I started writing this article is an App called “TrueEntropy” that generates random data using your Smart Phone’s camera. I haven’t had a chance to really dig into this yet, but it looks like an improvement over the standard random number generator I used for this example.

Sorry for all the preamble, let’s get into the examples now.
The data model we are going to be exploring is larger (in terms of the numbers of attributes and entities) than the data model we explored in parts one, and two of this three-part blog series.

The original data model had two tables: MyCalls and MyWorkOrders.  This new data model extends this by adding in:

  1. MyCustomers: The customers who placed the calls and whom the work orders were created for
  2. MasterCal: A master calendar for allowing for drill down within a period hierarchy (i.e. Year -> Quarter -> Month -> Week -> Date)
  3. MyCSRs: The Customer Service Representative (CSR) working for the telecom. The CSR answers customer calls and interacts with customers resulting in work orders for the customer


As an aside: You may have noticed There is also a table called “WorkOrderTypeAND” which I originally included to demonstrate a feature that is specific to QlikView; It allows the analyst to make ‘and’ based user selections (as opposed to ‘or’ based user selections) and is not found in Power BI (nor Qlik Sense).  While this feature does allow us to ask more questions (e.g. Who has Television AND NOT Internet), it is not inherent to the point I am making here. For this reason, I have since decided not to leave discussion out of this blog post, but if you’re curious about how it works you can download and experiment with the demo to see its functionality.

Here is how the model appears in QlikView:



Here is how the same model (minus the ‘AND’ table) appears in Power BI:




Now let’s get into the business questions.

To recap, our main question is: During February 2020 which Customers placed calls to the Call Centre without an authorized/known Customer Service Representative and did not generate any Work Orders?

For context, in our example there was a botched Chatbot roll-out issue in February 2020 that resulted in customers (who learned of the flaw through social media) creating non-authorized Work Orders using a special promo code. We want to know who penetrated the vulnerability but did NOT take advantage of the flaw. When we find these Customers, we would like to thank them and speak with them to help uncover other security issues communicated through social media.

To get there we need to find Calls with no CSR and from there find Customers with no Work Orders in February 2020.

We are going to break this down into two exploratory (practice) questions to show how those can be answered and then combine what we have learned to answer the third and final question During February 2020 which Customers placed calls to the Call Centre without an authorized/known Customer Service Representative and did not generate any Work Orders?

By breaking down our big question into smaller exploratory questions we can begin to get a sense of what is going in the data before diving into our third an final question.

Our two exploratory questions are:
  1. What is the distribution of calls with no authorized CSR?
  2. What Customer did not have any Work Orders February (of any year)?


Question #1: What is the distribution of calls with no authorized CSR?
In Qlik, to answer this question we:
  1. Select [All] in “CSR Name” 
  2. Select ‘Excluded’ in “Call ID”



In Power BI, we simply:
  1. Select ‘(Blank)’ in “CSR Name”



As can be seen in the above example, both Qlik and Power BI allow for interactive selection here with Power BI being better than Qlik in this instance given that you only need to click one button in PBI versus two clicks in Qlik.

Now let’s move to our next question.

Question #2: Who are the customers with no Work Orders for February (of any year)?

In Qlik we can answer this question by making three selections in real time:
  1. Select ‘February’ in “Month”
  2. Select [All] in “Work Order ID”
  3. Selected [Excluded] “Customer Names”
The answer is revealed as ‘Noah Q. Moore’.



In Power BI however, because of the one-to-many relationship, we cannot rely on Slicers alone. Instead we must take extra time to construct a Table (remembering to set the “Show items with no data” setting), wait for the table to render, then sort the table from smallest to largest (or we could have filtered on blank “Num Workers”). The resulting set cannot be selected as a Slicer selection, so the user is tightly limited in the follow-up questions they may ask.



Having answered both of our exploratory questions we are now ready to begin answering our third and final question.

Question #3: During February 2020 which Customers placed calls to the Call Centre without an authorized/known Customer Service Representative and did not generate any Work Orders?

Or stated differently: Who are my most honest customers; The customers who when calling without a legitimate CSR in February 2020 never created a Work Order?

To answer this question in Qlik we can do so through 2 simple iterations:

Iteration 1:
  1. Find all calls without a CSR. Two clicks:
    1. Select [All] CSRs
    2. Select [Excluded] Calls
  2. Filter on year and month:
    1. Select on ‘Feb’ in “Month” and 
    2. Select ‘2020’ in “Year”
  3. Isolate this filtered subset by selecting from main menu File->Reduce Data->Keep Possible Values


Iteration 2 (using subset derived for ‘Possible Values’ in Iteration 1):

  1. Find all Customers with no Work Order
    1. Select [All] in “Work Order ID”
    2. Select [Excluded] in “Customer Name”


FINISHED. We have answered our question. The five customers are:
  1. Brent Z. Skinner
  2. Charlotte E. Snow
  3. Hilel A. Maddox
  4. Malachi Perez
  5. Noah Q. Moore

If we want to see these Customers in their original context (as opposed to in the isolated Dataset), we can refresh the data which reverses the data reduction while keeping the selections.




How does Power BI compare?

Power BI can certainly answer this question but requires more Chart/Table building. In this scenario, it is not enough to simply list all the Customers and sort by Num Work Orders, because we need to be sure we are only seeing Customers with Calls. Even if we “Select All” Calls our Table will continue to show Customers who did not receive any calls in Feb 2020 because we have not explicitly filtered our Customer list by the number of Calls.

This means we must add a Visual Filter on the Table itself where “Num Calls” ‘is greater than 0’.
Given that we are already filtering on “Num Calls” we can also filter on “Num Work Orders” ‘is empty’ to create a cleaner looking table.


Even with our clean table, it is still not easily possible to convert this list into a Slicer selection and then explore and ask follow-up questions about these five (5) Customers further. Whereas in QlikView we have the Customers selected and can easily continue our game of ’20 Questions’.

Normally in this situation the Power BI developer would likely go up to the Data Prep level and use either Power Query or perhaps Excel to answer this question.

Going back to our sharpening and chopping analogy taken from Abraham Lincoln this means that in Qlik you are chopping more than sharpening and in Power BI you are sharpening more than chopping.
The ideal tool in my view behaves more like a musical instrument that you play like a piano.

Data and App Documents

Here are the data files:


Here are the application documents:

Conclusions


Given that Analytics involves exploring both positive and negative space and Qlik’s Associative Index explicitly supports “Excluded Values”, and given that most other BI tools - like Power BI – do not inherently support Excluded Values, then why has Power BI and other tools like Tableau for that matter managed to outperform Qlik from a sales perspective?

More to the point, why is ability to query “Excluded Values” – a core element of Analytics - not even even brought up as a software requirement?
The simplest answer to this question is that the buyers of these tools lack connoisseurship.
I can break this answer down to three specific problems:

  1. IT tends to favour homogenous technology platforms and deal with as few vendors as possible. “One throat to choke” as they say.
  2. The “drill” pattern tends to be visual in nature and is what most senior managers – and laypersons - can quickly grasp. These are the same people who control spending.
    1. The “contrast” pattern on the other hand is more obscure in nature and does not lend itself to flashy sales demos
  3. The “contrast” pattern as I described it above is generally regarded as a Data Prep (as opposed to Business Intelligence/Data Viz) activity
    1. Qlik’s answer to the “contrast” defies traditional product categories
      1. There are no recognizable competitors to Qlik that also have the ability to query “Excluded Values”

The first problem of IT favouring homogeneous technology over “best of breed” vendors is not specific to Qlik and is  fairly well understood and discussed problem. I have nothing to add that discussion here that has not already been said.

The second problem of how Business Intelligence/Data Viz tools are often sold based on slick demo is also nothing new. Although I will add some colour to this: On more than one occasion I have seen a senior executive extol the virtues of a Business Intelligence tool for the simple reason that they used a report built with this tool. Even if that report is nothing more than a rendered image file, the executive will continue to lavish praise on the tool which quickly creates buying consensus.

Oh, but you may say that there are research firms and consulting companies that executives can look to for guidance. Unfortunately, these companies tend to all be deeply compromised in one way or another. Without naming names, think of the most prestigious business & technology research company and advisory consulting company you can think of, and I can show you how they don’t hold a candle to Consumers Reports.  The main problem is these companies have deep relationships with the vendors they are supposed to referee.  We can see elements of this in journalism whereby many esteemed newspapers and media outlets will be more generous to certain institutions or individuals in order to maintain “access” to those institutions and individuals.  Consumers Reports on the other hand is entirely supported by a large consumer base. Even Consumers Reports is not perfect, but it is significantly more trustworthy than any of the research and consulting firms in the B2B world when it comes to evaluating technology.

The third problem can best be understood through this famous quote from Henry Ford: “If I had asked people what they wanted; they would have said faster horses.”
Putting our analytical hat on, if we contrast Ford and Qlik in this light we can see that Ford has the clear upper hand:

  1. Their cars will work on any roadway (unlike the closed railway lines)
  2. The purchaser of the product is also the user and a slick demo is not enough – the vehicle must pass a test drive
  3. While Ford was an early manufacturer of automobiles, he was not the first or only manufacturer and the presence of competitors created consumer demand

As a result, was Ford less of a “think different” underdog and more of a monopolistic bully.  There is another lesson to be learned from Ford, a lesson that many Canadians are aware of:

During the early days of automobile production Ford clashed with an inventor based here in Canada: P.L. Robertson the inventor of the Robertson screwdriver, also known as the ‘square’ screwdriver. You can read the story here.

To quickly summarize, in 1908 the Canadian P.L. Robertson invented a new type of screw head and screwdriver that was square shaped which allowed for maximal torque while minimizing stripping. This allowed the Robertson screws to be more securely fastened. Henry Ford realized the importance of this invention and demanded Robertson relinquish the patent rights to Ford for a single lump sum. Robertson refused and so Henry Ford went to a competitor, Henry Phillips, who had invented a similar but inferior screwdriver, the Phillips screwdriver. Phillips did maintain some patent rights though and thus the still inferior flat-head screwdriver continued and continues to proliferate.

If we all switched to Robertson screws, we would need fewer screwdrivers and screws, and construction quality would improve.

And yet, if you asked an American they would at first be oblivious and then confused as to why the United States would be committed to a demonstrably inferior product. The simple answer is the same reason why we use Microsoft: Vendor lock-in through network effects. But the fact that Robertson remains in use also tells us that it is possible to survive and even thrive based on the merit of the tool itself.

I come back to the word connoisseurship, and for reasons of environment and circumstance connoisseurship differs among people and organization. 
How does change come about then and why should anyone care?

For me the answer is simply emotion through experience: The experience of awe and wonderment and feelings of mastery and contentment from achieving flow states.

If you have ever discovered a powerful tool and have wielded it to great affect, then you will know these feelings I am talking about and you will yearn for these feelings.

But if you have never had such a breakthrough and not experienced awe and a sense of mastery then everything I am saying will sound like noise. But I think if you have learned to ride a bicycle or swim or ski down a hill, then maybe you can relate?

I have thus decided that my North Star going forward is to figure out ways I – Neil Hepburn - can bring about a feeling of awe and wonderment for Analytics. I may or may not use Qlik to help me here and have not decided. But I have no doubt that Qlik has helped me understand the essence and spirit of Analytics and I am going to challenge myself to see if I can bring about that experience of awe to others.

If this is something you are interested in hearing more about, you are welcome to drop me a line.