Tuesday, February 28, 2006

String: How I Loathe Thee

There I said it! I hate strings. I really do. Nothing screams, "BROKEN WINDOW!" louder than unnecessary abuse of strings. But, you say, "Blaine, you can't be serious! We need strings! How else would we represent names and labels?" OK, you got me. We need strings, perhaps it's the abuse that I loathe. I think an example would be good here. Let's say we had a client who wanted to keep track of his albums and group them by artist. So, we quickly come up with the following class definition:
class Album {
String title;
String artist;

//Assume the usual getters/setters and general behavior

public static Album[] byArtist(String anArtistName) {
...
}
}

This looks pretty harmless doesn't it? You might even say that it is "the simplest possible thing that could work". You would be right. Most of our methods would in the Album model object. But, what we're missing here is the power of objects and spreading behavior out amongst cooperating objects. We would also have methods on Album that really shouldn't be Album's responsibility. Basically with just strings, we miss out on the power of objects. The obvious solution might be to use:
class Album {
String title;
Artist artist;
}
class Artist {
String name;
Album[] albums;
}

At least, we could put that search method from above into an instance method of Artist and it would be easier to traverse the object model. We wouldn't have to worry about Album validating the artist name. Oour behavior would be spread out amongst several objects and closer to the data . We will be programming closer to the language of my user which is always should be the goal. Can you sit your client down and have them understand your code? If not, there's too much geek string talk going on.

But, you say, "Blaine, the first class is SO EASY TO MAP TO OUR PERSISTENCE FRAMEWORK! Surely, it should be the way to go!" Yeah, let's make our whole model pay the unreadable tax simply because we want to make one aspect of our system easy to implement. A model that speaks the language of the user is easier to maintain and increases understanding of the domain. It's always easier to read the message directly than deciphering what you did six months ago. Strings allow behavior to be ill-placed and meaning to be lost or worse obfuscated. It's the slippery slope to design smell and broken windows.

"OK, Blaine, you're going off the deep end here! The example you gave sucks! I don't see your point. Strings are simple. Objects are more complicated. Why go through the extra hurdle if my model doesn't demand it?" Alright, alright, alright! Strings are primitives like numbers and dates. They should be treated as such. It's a matter of taste, but I believe you should use atrings only for primitive things. You might see artist as simply a String, but what happens when your client wants to know more about an artist? We could easily add this new information to an Artist object for little cost.

In the first example, what really is the instance variable "artist"? Is it the name of the artist? It could be and that would be the assumption. But, it could be anything. Hell, someone might even stick XML into the artist variable. Nothing stops them. Our intention is not specific. An Artist objects reveals exactly what we want and adds little complexity, but gains us the ability to be more agile to our client. Our readability is increased. We also have lower cognitive friction because behavior is in the right place. Strings force behavior to be placed in unnatural spots if abused.

I've seen too many examples where strings are overused and make refactorings difficult. The reason is that meaning is generally lost and it's hard to know what the String is truly representing in spaghetti code. I've even seen people hack around them (like putting XML in a field and parsing it in methods). I've seen it all. The point here is not to avoid Strings, but look at them for what they are. Primitives. Sure, the above example could have stayed like the first. And it might have enjoyed a nice happy life. The minute the client needs additional information is when we should make an Artist object and refactor. Use common sense. Overuse of strings makes me take out my code reviewer magnifying glass because generally there's sinister bugs lying underneath. They are used too often by lazy modelers who don't want to create an extra class because it wasn't "the easiest possible thing", oops, I meant, "simplest" (yes, there is a HUGE difference, one makes the answer on thought, the other on laziness). Now, don't get me started on strings in test cases as short-cuts to comparing objects. That's another blog entry...

No comments: