Saturday, August 25, 2007

Promises And String Concatenation

You've read about it a million times. Beware of multiple string concatenations. They make your code slow and consumes memory. You've read it in the Java books (known as "use StringBuilder or StringBuffer instead of +"). But, why? Why isn't this handled at a lower level? Why can't the VM or compiler just do the right thing?

Messages are the power of objects. So, why not make a new object that when sent the + message, it simply returns an object that waits to do the concatenation until it is needed. This new kind of object should understand the same protocol as string. This could all be handled underneath the covers. If it was done at the VM or compiler level, programmers would never have to know.

I thought I would do a sample implementation. It's rather easy (in a dynamic language). First, we need to implement a Promise class and here's the Ruby code complete with a simple test:
require 'rubyunit'

class Promise
def initialize(&block)
@calculation=block
@value=nil
end

def value
return @value if @calculation.nil?
@value=@calculation.call()
@calculation=nil
freeze()
return @value
end

def value?
@calculation.nil?
end
end

def promise(&block)
Promise.new(&block)
end

class PromiseTest < Test::Unit::TestCase
def test_simple
promise = promise { 3 + 4 }
assert(!promise.value?)
assert(7 == promise.value)
assert(promise.value)
end
end

Pretty simple, huh? Create a new Promise object on a block (or closure or lambda or whatever you like to call it) and it will only call the block once when the message "value" is sent to it. If the message "value" is never sent, the block is never evaluated. Can you think where that might come in handy? I can think of several, but the best one is when trying to create a message to log. If you don't log the message, you wouldn't need to do the concatenation. Again, not doing the computation upfront can not only allow us to manage memory better, but also not to do needless calculations.

Enough talk, let's get to the good stuff, right? Here's my implementation of delaying concatenations and check the tests out at the bottom:
require 'stringio'

class DelayString
def initialize(oneString, anotherString)
@strings = [oneString, anotherString]
@promise = promise do
stream = @strings.inject(StringIO.new) do |output,each|
output << each
end
@strings = nil
freeze()
stream.rewind
stream.read
end
end

def +(another)
return another.concatBeforeDelayString(self)
end

def concatBeforeString(another)
@strings.unshift(another)
self
end

def concatAfterString(another)
@strings.push(another)
self
end

def concatBeforeDelayString(another)
strings_each do |each|
another.concatAfterString(each)
end
another
end

def concatAfterDelayString(another)
another.concatBeforeDelayString(self)
end

def to_s()
@promise.value()
end

private
def strings_each(&block)
@strings.each(&block)
end

end

class String
def +(another)
return another.concatBeforeString(self)
end

def concatBeforeString(another)
another.concatAfterString(self)
end

def concatAfterString(another)
DelayString.new(self, another)
end

def concatBeforeDelayString(another)
another.concatAfterString(self)
end

def concatAfterDelayString(another)
another.concatBeforeString(self)
end
end

class DelayStringTest < Test::Unit::TestCase
def test_simple
add = "3" + "4"
assert_equal("34", add.to_s)
end

def test_string
add = "3" + "4"
add = "2" + add
add = add + "5"
assert_equal("2345", add.to_s)
end

def test_delay
add_before = "1" + "2"
add_after = "3" + "4"
add = add_before + add_after
assert_equal("1234", add.to_s)
#make sure to get same answer twice
assert_equal("1234", add.to_s)
end
end

One new class called DelayString handles not doing the concatenation until absolutely necessary. It does this by creating a Promise that calculates the string by using a StringIO object (Stream or StringBuilder in Java terms). All it does is keeps a collection of all the strings it needs to append to one another. The power is now that we get the nice succinct message "+" and all of the benefits of using a stream object (or StringBuilder). Of course, we would need to add more methods on our DelayString so that it has the same protocol as String. A little more work to make our implementation seamless.

Below is the test method I added to find the times it took to run for delayed and normal concatenation:
    def test_performance
add = ''
1000000.times do |iteration|
add = add + 'a'
end
add.to_s
end

The new delayed implementation ran at 23.8 seconds. Not bad to do a million additions and a lot of little ones at that. Now, what were the results the old way? Well, all you have to do is comment out the + message:
#  def +(another)
# return another.concatBeforeString(self)
# end

It took 4565.68 seconds to run the normal way. It performed poorly and took up a bunch of memory. Yuck. It's what the books warned us about right? It's what we expected somewhat. I didn't expect how much of a performance gain I really got. Pretty cool, huh? Amazing.

It's unlikely that we'll do something to this extreme in the real world. But, wouldn't it be nice to not worry about performance in our regular code? If we find that our implementation is sub-par, one of the new benefits is that we can change it in one place.

Wait a minute. We just got better performance and got to keep the simple way of doing things? Not one lick of our already existing code had to change. The power of messages is powerful indeed!

1 comment:

Sammy Larbi said...

Excellent post. Have you thought about contributing it to Ruby?