Saturday, March 03, 2007

Rebooting Repositories: Are Wikis more viable for Metadata, CMDB, Document Management, and other forms of Enterprise Repositories?

I have been grappling with the general problem of knowledge repositories. It’s been driving me nuts. While metadata repositories have been around for a while, new breeds of repositories are emerging at an increasing rate. In particular CMDB repositories (Configuration Management Databases), Enterprise Architecture repositories, Business Rules repositories, and so on are turning up all the time. The problem of course is that:
a) The information in each of these repositories is related in some way, and those relationships are relevant, and are themselves information (i.e. derived facts)
b) The repositories all have their own data models which cannot be easily integrated.

One approach is to build your own repository, and extend it as needed. Another approach is to take an “anchor” repository and attempt to extend it. So, for example taking a CMDB, and extending it to include entities and attributes for a metadata repository. However, both of these approaches require a great deal of effort to build and maintain, and in the attempt to create a cohesive view of knowledge, we invariably get bogged down in the plumbing of the repository itself. An excellent article which exams this problem, is Repositories Build or Buy by Malcolm Chisholm.

What I feel the problem really boils down to is rigid data models that cannot be dynamically changed, and in turn require integration projects. I am a big fan of the relational model, and to my knowledge, it is the only complete data model that exists. Data that has been properly normalized, constrained, and indexed can answer pretty much any question about itself.

While integrating any two relational data models (i.e. repositories) across heterogeneous systems is always possible, it is usually very difficult. While there are numerous reasons for this, I’ll point out two major issues that will never go away:

  1. Links in the relational model are represented as foreign key to primary key relationships. Such linkages presuppose a single system [the RDBMS] overseeing both the primary key entity and the foreign key entity. Contrast this to the world wide web, where anything can link to anything, and there is no single system enforcing referential integrity. While this would not be acceptable for a “bet-your-business” operational system, when it comes to knowledge management, I think it’s fair to relax the rules a bit in favour of agility.
  2. Most software developers treat the RDBMS as a “bit-bucket” to store data, and not as a system in its own rite that is capable of managing data on its own. Furthermore, any entities used by the developers are thought of as “black boxes” to only be interfaced through the developers [typically hidden] interfaces. As such, going in and adding even a single column is fraught with peril. In any enterprise, changing a data model in production is typically the riskiest operation you can perform, and runs the greatest cost due to the amount of analysis and regression testing required.

There is also a major marketing problem with traditional repositories: The average person just sees them as obscure “black boxes” that are of the domain of techy geeks. I have argued in the past that data governance and data stewardship coupled with a well structured metadata repository is necessary if you want to achieve data interoperability, and the purest in me will always believe this, but the pragmatist in me also knows that an 80/20 solution that can be sold and implemented is better than no solution at all.

Thus without further ado, I propose that as enterprise architects, IT service managers, and data managers, we seriously consider a Wiki approach to managing and integrating our knowledge. I.e. a Wikipedia for the enterprise. Now, I’m well aware that like anything that’s popular out there in the internet world, someone is trying to apply it to the corporate world. In other words, I don’t think the idea of an enterprise “Wiki” is anything new. However, I feel that people view Wikis in a very narrow light that does not do justice to its potential, and I’d like to point out some alternative ways in which we could marry Wikis to enterprise repositories, like a metadata repository or CMDB.

Wikis in a corporate sense are often thought of as a combination document management system cum message board. It’s a place where you could put a document that could be about a procedure to backing up a server, followed by a tape retention process. Users could go in and edit the Wiki, and time the procedure itself every changed. They could then record what they changed about the procedure in the edit notes. Anyone who is familiar with a document management system knows that this is nothing new, but for the uninitiated, a Wiki is more approachable and easier to digest. I used to work with developing integrations and add-ons for DOCSOpen (the most popular Document Management System of its time), and while I could argue the merits of a document management system (primarily its third party integrations), I would have no problem recommending a Wiki approach if a client was interested. But I digress…

I believe a Wiki could be extended to hold and maintain corporate documents, Metadata, CMDB data, and all other enterprise repository data, if the following shortcomings could be addressed:

  1. We need to have more powerful editing tools. The current way of editing a Wiki reminds me of when I used to write essays in university using LaTeX. It was always very precise, and you could get beautiful layouts, and once you knew your way around the mark-up language it was very easy to put together slick looking documents. But I had to create a Makefile just to “compile” my documents, and the idea of asking my peers to edit a LaTeX file was not feasible as it was just too techy for the average person. I was always a big supporter of LaTeX since it worked for me, but I acknowledged that it was basically useless for the average person until user friendly LaTeX editing tools came around.
  2. We need to have more experience and tools to create Directed Folksonomies. A Folksonomy is basically just a taxonomy that has been created by a user community. For example, you could create a classification system for comic books referring to various genres and subgenres. Of course the problem with a Folksonomy is that it expects the person doing the classifying to know what the various genres and subgenres are to begin with and that they are also using these classifications correctly. A Directed Folksonomy on the other hand simplifies this task for the classifier as it allows them to pick and choose the correct genre and subgenre, and ideally it should provide concise definitions of categories and subcategories. This leads me though to the third shortcoming of Wikis.

  3. We need more granular security. We need to ensure that select parts of Wikis can be edited or viewed by select users and in only select ways. We would also need to ensure that for Directed Folksonomies that only select users (Data Stewards) could create and edit the Folksnomy definitions, but perhaps allow a greater number of people to tag information using those Folksonomies.

  4. We need autonomous agents that can modify sections of Wikis on their own. Taking a CMDB example, it would be nice to check a single server page to see what servers are currently up, and for how long. That same page could have multiple sections: some sections being edited by people; and other sections that are only edited by autonomous agents. By allowing both people and autonomous agents to edit the same page, we no longer calcify those data models as the agents would always be aware that the full Wiki is not its dominion, and only a section of it is. Compare this to how RDBMS tables are currently treated, and what the consequences might be if we were to add even a single column to a single table.

  5. We need better audit tools and processes to ensure Wiki integrity. It would be nice for example to ensure that when a metadata element is pointing to a server entity/Wiki, it’s actually pointing to a server entity/Wiki and not just a dead link. While I don’t believe we need to have such integrity enforced by some overruling system (like an RDBMS), it would be nice to have spiders that could crawl enterprise Wikis as an a posteriori batch process that could point out issues that need to be rectified. I feel that this approach would work better, as it would behave the same way should there be a company merger, and could quickly and effectively assist in integrating both sets of knowledge, without ever actually preventing such an integration from happening due to overly strict a priori constraints.
  6. We need better reporting tools to allow BI-style reporting. Although I’m suggesting a Wiki approach to entering and maintaining knowledge, there’s no reason why we couldn’t suck this data into a data warehouse for reporting. Such reports could tell us where change is happening the fastest, and by whom, or which areas of knowledge are old and creaky. The possibilities are endless.

Before wrapping up, I’d like to paint a picture of a Wikified enterprise through a simple use case scenario:
You’ve only been with the retail company for a couple of weeks, and have only just got to know the DBAs and a few developers. You’ve been asked as part of a SOX audit to find out where the credit card data is physically stored for its corporate customers, and confirm [or deny] that the server is properly backed-up, and the back-ups are encrypted.

In your past experience you would start making the rounds. You’d be calling people, waiting for responses, following-up with more questions, requesting documentation, and not always getting it. In the end you’ll get your answer, but it could take you the better part of a week just to track down the right people and get them to locate the correct documentation.

In my dream Wikified enterprise, things would instead go like this:

  1. You go to the Wiki portal where you search for “credit card”.

  2. You find various results for pages with credit cards, but near the top you find a “credit card” data type page. You decide to click on it.

  3. The page brings up a definition of the credit card type, and includes links to all the various entities that utilize this type. The information on these pages is structured within sections, and has been edited and maintained by knowledgeable data stewards.

  4. You click through all the entities that have the credit card attribute (there would be links from this page). You scan the definition of each entity until you find those entities that pertain to corporate customers. The definitions of the entities should provide this information, or at least provide links to other Wikis which would provide this information.

  5. You have now located all entities that hold corporate credit cards. You now need to determine where these entities are physically stored in production. The entities themselves would have links back to which DBMS they are stored in.

  6. You then click on the DBMS Wiki link for each entity to find out where they’re stored. From here you have located three DBMSs: a data warehouse; an ODS; and a third DBMS.

  7. For each DBMS Wiki, you click on its link to find out more about it. One of the sections has links to the physical hardware that the DBMS runs on. You then go to this server Wiki. You read up on the server’s security and make notes. The server Wiki has a link to the data centre Wiki. You then click on the Data Centre Wiki to read about it, where it’s located, and its security policies. You bookmark this Wiki, while taking notes.

  8. You go back to the DBMS Wiki to read about the back-up and retention policy, which is contained in its own section. You note that there is no mention of whether the back-up tapes are encrypted or not (even after checking the respective back-up server Wikis), and decide to call the person who last edited the back-up section. After getting in touch with this DBA she informs you that the tapes are in fact not encrypted. Good to know. No problem, you quickly go into the Wiki and make a note that the tapes are not encrypted, citing where you got the information from.

  9. You now take the information that you have collected and produce a report. You look back and realize that it only took you a couple of hours to collect and digest all the information, including the call to the DBA. The report took another hour to draft and format, and you feel confident if challenged that you can corroborate your facts. A job well done, in complex environment.

This to me is what the agile enterprise is all about. Clearly we’re still a ways off. However, I’m optimistic that we’ll be there sooner than we think.

There is one last thing I’d like to leave you with on this topic. I have been blabbing on about metadata for quite some time now. While most people “in the know” are quick to agree that metadata is essential to getting to the root of IT failures, it has yet to capture the popular imagination and I fear it never will. On the other hand, Wikis have captured the popular imagination. Both my parents know what they are, and can envision their use in many different ways. I mention metadata and I get blank stares. I mention Wikipedia, and I’m always in for a lively discussion. So, as a pragmatist I feel that even though there are technology hurdles to clear in bringing Wikis to being robust enterprise repositories, I feel that it is a cinch in comparison to convincing people about the merits of metadata. So, the next time you start to talk about metadata or knowledge management, try instead starting off talking about “Wikipedia for the corporation”, and go from there. I bet you will have a much better chance of engaging the person you’re speaking with, whether they agree with you or not. Dialogue is only the beginning.