Sunday, October 02, 2005

Shared vs. Mobile Data

Alex Bunardzic writes about "Shared Data and Mobile Data" in response to his article on "Should Database Manage The Meaning?" He makes an important distinction in the debate. There's people who think the database should be the central point and then, there's those of us who think data should be mobile (not centralized). I think the folks that believe in centralized databases should read William Kent's "Data And Reality" book to see the problems with that view. It basically boils down to meaning. Different projects will view what might seem to be the same data differently. We might even use the same terms, but have slightly different meanings. It's the slight difference that kills the shared database model. Not to mention, that over time shared database models become unchangeable and unmaintainable because of the massive amounts of coordination it takes to make a change (even with volumous documentation which is usually out of date anyway). Who wants to make a change and be held accountable for every system they break? It becomes a "let's just hack this field that only we use so we don't have to go pull teeth". Ouch. It's easier to just change your code than burden evryone else (who may not want to make ANY changes to their code at all...it's not in their budget) I've seen this scenario more times than I care to count.

Mr. Kent's book exposes these problems and lays them out. The book is great for exposing the problems that basically boil down to humans having ambigous language. Even though our computer languages are strict, our meaning still is king and that's the disconnect. It's the same reason that common business objects that cross enterprise divisions are problematic as well. They each have their own meaning of various words used to label things. It's best to have each system to have their own database and have distinct boundaries where they exchange information. Everyone keeps their definitions (which makes sense to their users) and makes conversions on the boundaries.

Eric Evans' excellent "Domain-Driven Design" offers the solutions to the problem. In fact, he has an example in the beginning of Chapter 14 (Maintaining Model Integrity) that explains it. Different systems might be working on what seems to be the same model, but they are looking at it from different contexts. Each system should have its own "bounded context" of the model. Here's an excerpt from "Domain-Driven Design" that explains it clearly:
Explicity define the context within which a model applies. Explicity set boundaries in terms of team organization, usage within specific parts of the application, and physical manifestations such as code bases and database schemas. Keep the model strictly consistent within these bounds, but don't be distracted or confused by issues outside.

He continues to make a point of keeping the boundaries distinct (don't allow them to blur) and be explicit in communication with outside systems (by using "context maps" for conversion to/from outside systems).

The bottom line is keep your systems distinct and avoid the allure of sharing common domain/data models. Keep the boundaries distinct and exchange information with outside systems. I wouldn't even allow common exchange formats across systems (It runs into the same exact meaning problems). Define the exchange for each system (A different DTD for each one if you use XML, I'm not saying to not use standards, just don't define a common DTD for all system exchanges across the enterprise). Don't be lazy. Let your system keep the meaning and don't let it get watered down. Sharing allows meaning to be blurred (and misused) and this lets the bugs in. Data warehouses sound like good ideas, but really your duplicating the meaning to pull everything together. An agent system would be a better answer where it gathers the information from each of the systems at their boundaries. Since the agent would be its own system. It's a harder path to follow, but it's the one that allows your systems to continue to grow and not stagnate. The minute you buckle a system together with another, the stagnation will begin and the cost curve will start to rise.

4 comments:

Andrés said...

Just a quick comment... Mr. Kent's book is really about people having different intentions thus distinguishing things differently, not so much about vague language.

For each particular team of users of a central database, it is possible for them to articulate what they are modeling pretty well. The thing is that different intentions will lead to different distinctions. It's not vague language, it's a different point of view.

Blaine Buxton said...

Then you misunderstood my comments or I wasn't clear enough. The point I was trying to make was exactly that. The vague language is because we can affix multiple meanings to the same label because of the different perspectives. Language is not exact because of this.

Ravi Venkataraman said...

It is unfortunate that many people in the software industry do not understand that applications come and go, while data remains for much longer.

If, as you are suggesting, every application defined its own terms and essentially created its own data model, how can the enterprise ever know anything? Say, for example, every department and system defined "customer" in its own way, then how can you answer the simple question, "How many customers do we have?"

The fact that data modellers create unchangeable, unmaintainable data models is a reflection on their lack of competence. Most data modellers know zilch about data modelling. Just like the vast majority of Java developers know nothing about OO concepts.

The right approach is to build the data model from an enterprise point of view, then use database views to show the application specific representation. That insulates the application developers from the actual implementation of the data model; and takes care of all the problems you mention.

The main purpose of a RDBMS is to enforce data integrity. Applications used to enforce integrity, but that failed more than three decades ago. That is why the shift to RDBMS occurred. The network model and the hierachic model, along with using files for storing data, rather than a higher level abstraction, created many problems.

You seem to be suggesting that we should go back to the 1970s and try things that were known to have failed at that time. Why do you feel it should succeed this time?

Ravi Venkataraman said...

The original weblog article is flawed in so many respects that one is filled with despair.

If the suggestion is that each application develop its own data model, then what happens to data integrity? How are the common integrity and business rules expected to be applied across all applications? Duplication? What if one application modifies the code in a manner inconsistent with the business rules?

I suggest that you learn a little bit about RDBMS before talking about them.