Tuesday, January 09, 2007

Implementing Data Governance

If you’ve every purchased a home, (or prepared a Last Will and Testament, negotiated a severance package, or gone through a divorce), it’s likely that you have hired a lawyer to guide you through the process. If you managed to find a good lawyer, hopefully she did a decent job ensuring your best interests were being served. Hopefully she presented you with the key decisions that need to be made, adviced on how to evaluate those decisions, and did not overwhelm you with the minutiae of the law.

You probably also hired a lawyer that was an expert in her field, whether it be real estate law, wills and trusts law, employment law, or family law. This person probably knew her subject pretty well, although she wouldn’t know your specific wishes until meeting with you. She would still have a good understanding of how your situation fits in to the context of the law. To complete the required work, your lawyer may even rely upon other lawyers or legal workers to flesh out all the details and take care of some of the routine work. Continuing with our real estate example, your lawyer would then liaise with the Land Registry Office (that’s what they call the place where deeds are recorded in Ontario) to ensure that the ownership has been officially recognized and that there is a clear and unambiguous record stating your entitlement to the land. At the end of the process, assuming things went as planned, you would be responsible for and take ownership of your new home, while relinquishing ownership of your old home.

In the world of Data Governance things should work in a similar way. However, instead of the client taking ownership of a property, the client takes ownership of a particular Subject Area. Instead of working with a lawyer, the client would work with a Data Steward. Instead of storing your official records with the Land Registry Office, you would instead store pertinent information in a Metadata Registry (often referred to as a Metadata Repository). Finally, there would be a straightforward and unambiguous process to follow, as is the case when dealing with a real estate transaction. Now, I’ve probably gotten a little ahead of myself here.

What exactly does it mean to take ownership of a Subject Area? What is a Subject Area for that matter? What is a Data Steward? And what is a Metadata Registry?

A subject area is an area or interest or subset of data concerning something of importance to the enterprise. This could be Customers or Employees or Products or Stores or Financial transactions. To take ownership of a particular Subject Area not only entails taking ownership of the quality of data, but just as importantly entails taking ownership of how that subject area’s Data Elements are defined (a Data Element is just a constrained piece of data, such as a credit card number, or customer address. This is not to be confused with unconstrained or unstructured data, such as a paragraph in a blog). Establishing ownership of a particular Subject Area is probably the most difficult and politically charged aspect of data management. It is also the reason why successful data management professionals must have strong interpersonal skills to succeed. Data management is not as much about technology as it is about PR, marketing, persuasion, and negotiation.

Wikipedia defines a Data Steward as a role assigned to a person that is responsible for maintaining a Data Element in a Metadata Registry. While this is not incorrect, I feel that the role also encompasses the following:
Is assigned to, and has an in depth understanding of one or more Subject Areas (e.g. Customers, Products, etc.).
Has good negotiation skills: This is necessary when attempting to extract information from both internal and external colleagues. The accuracy of information is as good as its source, and Data Stewards must both work hard and use finesse to get people on their side to provide the most accurate and detailed data definitions.
Liaises with other Data Stewards.
Understands the Enterprise’s method of data classification. E.g. the various value domains, value domain types, value domain type classes, etc. (yeah it’s a bit like zoology in more ways than one)
Understand how to relate Data Elements to business and IT entities. E.g. how the customer credit card data element relates to business rules, business cycles, or even which physical servers it is stored on.
Is well networked with, and liaises with Information Security to establish security classification of data
May require management skills to manage more junior Data Stewards

A Metadata Registry is a registry or repository used to not only store your data element definitions, but how those data elements relate to all other entities in your enterprise. The Metadata Registry need not be a sophisticated enterprise application (as many Metadata Registries will support ISO 11179 or OMG MOF data definitions, layered data models, Change Management user workflow, modelled business entities, integrations with ETL tools, BI tools, and IDEs, and more), but can be as simple as a spreadsheet or even text document. As long as the document has enough structure to show what each data element’s fully qualified definition is, but detailed enough to distinguish between similar but different data elments. In my experience the vast majority of Metadata Registries are just simple Excel spreadsheets, or at the most a Microsoft Access DB.

Alternatively, if you already have another type of repository such as a Configuration Management Database (CMDB), it may be possible to extend this registry to include data element definitions. By doing so, you extend the value of your original repository while at the same time solving the problem of where to store your data element definitions. From what I can tell, this is one of the best strategies you can follow as you are only building what you need while extending your IT value chain. By the way I shouldn’t take credit for this, this is really just my interpretation of a methodology and approach that has been proposed by Charles T. Betz and which is described in his book "Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler's Children". You can also read his blog at: www.erp4it.com.

All right, so I’ve described what a Subject Area is, what Data Stewards are responsible for, and what a Metadata Registry is (you can also read more about Metadata in my earlier blog post: “Metadata Defined”). So, how does this all pertain to Data Governance? And how do I implement Data Governance?
When talking about any form of implementation, we are talking about the HOW and not the WHAT (I covered that in my last post). As we all know, there’s a million ways to skin a cat, so anyone who claims to know “the way” is fooling themselves. So, the steps described below, are merely “a way”, but a way that I have seen work in other companies with great success. To be perfectly honest, the steps I’m outlining below are a composite of many ways that I have observed, and I’ve just tried to find the common themes. So, in simple bullet form below, I am describing in ultra-simplistic detail HOW to implement data governance:

Determine which Subject Areas you would like to implement data governance for. It is recommended that you choose subject areas that have the greatest number of data contention issues, and therefore the greatest ROI for your governance of those Subject Areas. As this will provide you with the greatest opportunity for funding. Securing funding of course is often the hardest part in any new venture. However, given that you will be working with the most contentious data it’s difficult to wade into this slowly. Alternatively, if you can get funding for a pilot project that covers less contentious data then this might be the way to go from a risk perspective.
Put together a business case for project funding. This will involve determining the ROI for the initiative. For an outside consultant, this is difficult as you are not aware of the specific problems felt by lacking Data Governance. What you can attempt is a Data Governance Audit. As a matter of fact, IBM has recently announced a Data Governance Service. The IBM service is essentially an 11 point audit, I suspect followed up with appropriate recommendations. Selling an audit is often tough to do, but if you are going to pitch one, the best time to do so is at the end of the budget year when departments are looking for ways of spending their remaining cash (to ensure the same or greater budget for next year). Additionally, there are many factoids being published on a regular basis which you can use to justify Data Governance in general. For example, Accenture recently completed a study showing that middle managers waste two hours each day just searching for the right information, and once they get it, nearly half of it is useless.
Assuming you get funding, the next step is to determine the Data Owners of the respective Subject Areas. Determining who owns what data is probably the hardest part as there will be people who don’t want to take ownership of data, and others who want to take more ownership than they should be entitled to, and then of course there are those messy grey area situations where there may need to be multiple owners of data. The best way of convincing someone that they should own a Subject Area is to ask them how much they have to lose if the data quality is poor, or if someone else who doesn’t have as vested an interest in the data instead takes ownership. Yes, it’s a negative way of thinking but sometimes you need to paint a picture of anarchy and chaos to mobilize people into owning their data. This is especially why it’s important for you [the Data Management professional] to have excellent negotiation and persuasion skills.
After identifying and establishing Data Owners, it is now time to round-up the Data Stewards who will work on the Data Owners’ behalf to ensure that the Data Element definitions are correct. It is generally a good idea to select people who have a good knowledge of the data to begin with (as this will save time in sourcing and documenting Data Definitions), but also someone who has strong analytical skills. This could be a business analyst, a software developer, a DBA, a project manager, or potentially even a Customer Service Rep. Since the Data Steward role is just a role, it need not be a full time job. In fact most Data Stewards spend less than half of their time on Data Stewardship.

While selecting your Data Stewards you will also need to consider the reporting structure, and depending on how far you want to go, you may want to consider multiple tiers of Data Stewards. I have witnessed organizations with three levels of Data Stewards, but they had over 150 Data Stewards which is something that clearly does not happen over night.
You now have clearly established in scope Subject Areas with corresponding Data Owners. You also have Data Stewards assigned to work on behalf of those Data Owners and who are Subject Matter Experts (SMEs) for those Subject Areas. Now you need to select your Metadata Registstry. Software selection in general is never a trivial task, so I’m not going to pretend that selecting a Metadata Registry is any easier. There are best practices particular to selecting a Metadata Registry which I’ll write a later blog entry about, but for now will leave this one open, and just assume that it will happen. One thing I can reveal, is that if you’re just getting started with Data Governance then you’re best to keep the Metadata Registry simple and easy to use. Something like an MS Access DB or even an Excel spreadsheet will do. At a later point you can always migrate to a more robust solution. Alternatively, as I mentioned earlier, you can think about extending an existing repository such as a CMDB. However, if you choose this route, it will likely be a lot more difficult to migrate to a different solution, or change what you have.
Your ducks are all lined up now! You just need to tie it all together with policy and procedure. This is where you need to put your Soft Systems thinking hat on and figure out the best way of ensuring people can do their job building systems, repairing systems, and decommissioning them without too much policy and procedure getting in their way. But at the same time ensuring that the Data Owners’ best interests are being preserved. For this you will need to determine the following:
Which technology domains are in scope and which are out of scope. For example you don’t want to waste your time governing how people modify a tiny workgroup MS Access DB. Nor do you necessarily want to govern how a hash file is maintained. My advice is to concentrate first on data that is most likely to be shared across technology domains and business units. In particular I recommend focussing on RDBMS data elements (e.g. data contained within an Oracle, DB2 or MS SQLServer DB). To this day, the relational model is still the only complete data model. No other data model provides built-in guarantees of: referential integrity; value constraints; and access and update times. Even with XML, a document can point to any other document, even if that document doesn’t exist. Additionally, the RDBMS has more adaptors than any other data store, so more people have the ability to easily connect to it. Finally, there is already a great deal of rigour around the RDBMS, from back-up regimes, to security regimes, to access regimes. So data stored within the RDBMS is already perceived as more of a hardened asset than data stored in other forms.
You will need to determine what the new procedures will be for:
i. Adding new Data Elements
ii. Changing existing Data Elements
iii. Removing or decommissioning Data Elements
How this procedure will work in a multi-tier environment. I.e. how the process would work when there is a Development environment, a Test environment and a Production environment (or how many tiers you may happen to have).
How to integrate the procedure with exiting Change Management procedures and processes to ensure the correct approvals take place, but also ensuring that no more people are required to make a change than necessary
Who will be involved when executing the aforementioned procedures. Creating a RACI chart is a good way of documenting this.

If you are working for a smaller organization, then the above methodology can be greatly simplified. You will be able to quickly and efficiently determine who the Data Owners and Data Stewards are. Coming to an agreement on how the procedures will work, will also be a lot faster. As a smaller organization you would also be wise to keep your Metadata Registry very simple, such as an Excel spreadsheet or MS Access DB.

If you are working for a large organization, none of what I described above will come easy, but don’t look for silver bullets. Yes, good software can perhaps automate some of the manual procedures through people workflows and tool integrations. Good software may even help in the initial discovery process to document your data flows (especially when ETL tools are involved). However, most of the hard work comes back to the right people being properly engaged to make well informed decisions.

Going back to my first example where I talked about a lawyer working with you to complete a Real Estate transaction. Yes, you and the lawyer may be able to use technology to work more efficiently. For example, in the future (or maybe the present by the time you read this) your lawyer will be able to submit your deed to the Land Registry on-line, saving some time. Or maybe you and your lawyer can collaborate on-line without you having to make the trek down to her office to sign documents. Regardless, I would still say that the most critical work that is being done is you making sound decisions, and allowing your lawyer to interpret these decisions while ensuring that all the legal details are being taken care of in your best interests. Until artificial intelligence can make some great strides (which I’m not seeing), we’re going to depend on lawyers to help us make decisions when the law is concerned, and Data Stewards when data is concerned.