Finding a ruby program running like a three-legged dog on tranquilisers, I started to try to improve its performance. Searching the internet I could find no rules of thumb for improving this already documented in one place. So on Ruby-Talk:154733 I asked:
Subject: Ruby-specific performance heuristics? From: Hugh Sasse <hgs dmu.ac.uk> Date: Fri, 2 Sep 2005 21:34:17 +0900 I've been doing some stuff with CSV recently, having data in one flat file that I'm trying to normalize into tables. Thus, I'm trying to ignore things I've seen before so I don't create 2 entries for them, and that sort of thing. Jon Bentley points out that "Data Structures Programs" -- i.e. how the data is "shaped" determines how the program will be designed and perform. Often the best solution is not immediately obvious, as he discusses. So, for example, one can test for "seen before" by array = [] unless array.include? item array << item end which is much slower for clumped data than unless array.include? item array.unshift item end or one could use a hash, or maybe a set. So, my question is [a memeber of the set of "How long is a piece of string?" type questions]: Are there any heuristics for performance of *Ruby* data structures available, which guide one to designing code to *often* have good performance? The answer will be very much "it depends, you neet to test it, it's a function of data size, ...", but I suspect thee implementation, i.e. Ruby puts useful bounds on the problem. My searching has not turned up anything, though there are good performance hints in the PickAxe. I'm wondering if the info exists or if it would be worth trying to create it, or if it is just too broad a problem to be worth it. However, since we Rubyists often speak of the ability to put code together quickly, having hints around on how to do so and yield efficient code seems of some worth. Hugh
Contributions are as follows, but see the thread for full information.
freeze
a string that you are going to use as a hash key
because a String key will be dup
ed on insertion
if it is not frozen to prevent errors caused by aliasing"foo" + "bar"
is more
expensive than "foo" << "bar"
for
example. There are only some exceptions to this rule
(Fixnums for example).find('.'){|f| ... }
Dir['*/**'].each{|f| ... }
n = nil
obj.each{|elem| n = elem ** 2 and p
n}
obj.each{|elem| n = elem ** 2 and p n}
/#{var}/o
)
where you can.method_missing
:
permanmently add the method to the object instead of
generating it everytimeattr_accessor
methods and an initializer seems
to be faster than using Struct to do this.String#split(Regexp)
is faster than
String#split(String)
, and certainly
better than trying to use String#trim
to simplify the regexp therein.Created by Hugh Sasse on 02-SEP-2005
Last Modified by Hugh Sasse on 13-JUN-2006
$Id: index.html,v 1.5 2006-06-13 16:41:59+01 hgs Exp hgs $