Monday, October 10, 2005

Application vs. Data

Ravi Venkataraman wrote:
It is unfortunate that many people in the software industry do not understand that applications come and go, while data remains for much longer.

Is it that really true? I've been in the business for now over 10 years. And I've never seen the lights turned off ANY application. In fact, I've seen many mainframes persist when they should have been replaced. The application and the data is the whole picture. The application has the business logic and knows what to do with the data. I know our experiences will be different and our mileages will vary.
If, as you are suggesting, every application defined its own terms and essentially created its own data model, how can the enterprise ever know anything? Say, for example, every department and system defined "customer" in its own way, then how can you answer the simple question, "How many customers do we have?"

I think we're looking at the same problem through different lenses. I think applications should have access points to get information that's inside of them. I believe in encapsulation. It's the classic black box vs. white box. I see the problems in both and I wonder if we could come up with a mutual solution. A common database makes it easier to find "how many customers do I have?", but what if all applications had common access points and you could answer the same question? They both require expertise in different areas. Is one better over the other? The more I think about it, I come to the same conclusion: Both require planning and thought. You can certainly paint yourself into a corner with both. Maybe we should have a mixture of both. Can we have a mixture of both? What would that look like? I would love to discuss this further...
The fact that data modellers create unchangeable, unmaintainable data models is a reflection on their lack of competence. Most data modellers know zilch about data modelling. Just like the vast majority of Java developers know nothing about OO concepts.

I think if you look at any human endeavor, you will find a bell curve. Few will be awful, few with be great, and most will be average. I think a lot of times data modeling gets into discussions of performance way too early (much like getting into performance issues too early in OO design causes problems). Good data models are hard much like OO models are hard. I think I'm a pretty good data modeller, but I could be wrong. Do you have any good books that I could read? I'm always into bettering my craft.
The right approach is to build the data model from an enterprise point of view, then use database views to show the application specific representation. That insulates the application developers from the actual implementation of the data model; and takes care of all the problems you mention.

I believe this is a slippery slope. Common OO models to reuse across enterprises have been tried and I've seen them get stumped on "simple" questions. What seems straightforward for one part of the enterprise might not be for another. It's the difference in context between organizations. Vague language is the main culprit. Common terms might mean different things to different parts. To get them to agree and use them consistently is another matter. RDMS answers the problems of making sure my tables have integrity, but what about values in fields? For example, what about ranges? Most business rules are complex beasts and they usually go beyond the integrity constraints of RDMS. And to get different parts of a large organization to agree on the business rules for each segment is a hard problem. Like I said before, maybe we can discuss this to come up with a better solution.

The main purpose of a RDBMS is to enforce data integrity. Applications used to enforce integrity, but that failed more than three decades ago. That is why the shift to RDBMS occurred. The network model and the hierachic model, along with using files for storing data, rather than a higher level abstraction, created many problems.

Oh, I'm glad we don't store things in files anymore! I just think applications should have access layers that allow outside applications to intergate their information. Allowing an outside application access a public object model is powerful stuff. It allows the outside application to even update my system because I can enforce my business rules while keeping my data encapsulated properly. Anyway, thanks for the history lesson, but I know the reasons why RDMS came into being.
You seem to be suggesting that we should go back to the 1970s and try things that were known to have failed at that time. Why do you feel it should succeed this tme?

I am in no way suggesting we go back to the 70's. Most mainframe system provided no way for outside system to access their internal data. Most of the mechanisms have been added afterward. I'm suggesting a layer of abstraction above the database. It takes a little more effort to add a component layer to an application, but it's absolutely needed in enterprise settings. But, let's discuss and not give history lessons or bad assumptions:
The original weblog article is flawed in so many respects that one is filled with despair.

If the suggestion is that each application develop its own data model, then what happens to data integrity? How are the common integrity and business rules expected to be applied across all applications? Duplication? What if one application modifies the code in a manner inconsistent with the business rules?

I suggest that you learn a little bit about RDBMS before talking about them.

I suggest you don't assume what my background is. But, let's talk like educated and intelligent beings. I believe applciations should manage their internal data integrity and provide a model to the outside world. I believe in encapsulation at the application component level as well as the object level. I'm opening up the discussion. You make a lot of valid comments and I think we can come up with a cool solutioon if we concentrate on our goal and not tearing each other down.

No comments: