The not so low cost of calling dynamically defined methods

There are a couple of mantras that exist across programming communities, one of them is to avoid duplication or to keep it DRY. Programming languages equip us with different tools to avoid duplication. In Ruby a popular way to achieve this is Metaprogramming. Methods are dynamically defined to get rid off all duplication and we rejoice - yay! There might be other problems with metaprogrammed solutions, but at least we are sure that the performance is the same as if we’d had written that duplicated code. Or are we? As the title suggests this post examines the performance of these meta programmed methods . If you are looking for method definition performance or pure call overhead you’ll find this information in this post by Aaron Patterson. Before we get into the details I want to quickly highlight that this is not some theoretical micro benchmark I pulled out of thin air. These examples are derived from performance improvements on an actual project. That work was done by my friend Jason R. Clark on this pull request over at Shoes 4. As he doesn’t have time to write it up - I get to, so let’s get to it!

Let’s look at some methods!

(Please be aware that this example is simplified, of course the real code has varying parts most importantly the name of the instance_variable which is the reason why the code was meta programmed in the first place)

 class FakeDimension def initialize(margin_start) @margin_start = margin_start @margin_start_relative = relative? @margin_start end def relative?(result) result.is_a?(Float) && result <= 1 end def calculate_relative(result) (result * 100).to_i end define_method :full_meta do instance_variable_name = '@' + :margin_start.to_s value = instance_variable_get(instance_variable_name) value = calculate_relative value if relative? value value end IVAR_NAME = "@margin_start" define_method :hoist_ivar_name do value = instance_variable_get(IVAR_NAME) value = calculate_relative value if relative? value value end define_method :direct_ivar do value = @margin_start value = calculate_relative value if relative? value value end eval <<-CODE def full_string value = @margin_start value = calculate_relative value if relative? value value end CODE def full_direct value = @margin_start value = calculate_relative value if relative? value value end end 

Starting at the first define_method these are all more or less the same method. We start at a fully meta programmed version, that even converts a symbol to an instance variable name, and end with the directly defined method without any meta programming. Now with all these methods being so similar you’d expect them all to have roughly the same performance characteristics, right? Well, such is not the case as demonstrated by the following benchmark. I benchmarked these methods both for the case where the value is relative and for when it is not. The results are not too different -

Source: https://gist.github.com/PragTob/17f7dcab51ad98a3f064

File: meta_benchmark.rb

require 'benchmark/ips'

class FakeDimension
  def initialize(margin_start)
    @margin_start = margin_start
    @margin_start_relative = relative? @margin_start
  end

  def relative?(result)
    result.is_a?(Float) && result <= 1
  end

  def calculate_relative(result)
    (result * 100).to_i
  end

  define_method :full_meta do
    instance_variable_name = '@' + :margin_start.to_s
    value = instance_variable_get(instance_variable_name)
    value = calculate_relative value if relative? value
    value
  end

  IVAR_NAME = "@margin_start"
  define_method :hoist_ivar_name do
    value = instance_variable_get(IVAR_NAME)
    value = calculate_relative value if relative? value
    value
  end

  define_method :direct_ivar do
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end

  eval <<-CODE
  def full_string
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
  CODE

  def full_direct
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
end

def do_benchmark(description, margin_start)
  puts description

  Benchmark.ips do |benchmark|
    dim = FakeDimension.new margin_start
    benchmark.report("full_meta")             { dim.full_meta }
    benchmark.report("hoist_ivar_name")       { dim.hoist_ivar_name }
    benchmark.report("direct_ivar")           { dim.direct_ivar }
    benchmark.report("full_string")           { dim.full_string }
    benchmark.report("full_direct")           { dim.full_direct }
    benchmark.compare!
  end
end

do_benchmark('Non relative margin start', 10)
do_benchmark('Relative margin start', 0.8)

. Running the non relative version on CRuby 2.2.2 with benchmark-ips I get the following results (higher is better):

 full_meta 1.840M (± 3.0%) i/s - 9.243M hoist_ivar_name 3.147M (± 3.3%) i/s - 15.813M direct_ivar 5.288M (± 3.1%) i/s - 26.553M full_string 6.034M (± 3.2%) i/s - 30.179M full_direct 5.955M (± 3.2%) i/s - 29.807M Comparison: full_string: 6033829.1 i/s full_direct: 5954626.6 i/s - 1.01x slower direct_ivar: 5288105.5 i/s - 1.14x slower hoist_ivar_name: 3146595.7 i/s - 1.92x slower full_meta: 1840087.6 i/s - 3.28x slower 

And look at that, the full_meta version is over 3 times slower than the directly defined method! Of course direct_ivar is also pretty close, but it’s an unrealistic scenario as the instance variable name is what is really changing. You can interpolate the string of the method definition in the full_string version, though. This achieves results as if the method had been directly defined. But what’s happening here? It seems that there is a higher than expected cost associated with calling instance_variable_get , creating the necessary string and calling methods defined by define_method overall. If you want to keep the full performance but still alter the code you have to resort to the evil eval and stitch your code together in string interpolation. Yay.

So what, do we all have to eval method definitions for metaprogramming now?

Thankfully no. The performance overhead is constant - if your method does more expensive calculations the overhead diminishes. This is the somewhat rare case of a method that doesn’t do much (even the slowest version can be executed almost 2 Million times per second) but is called a lot. It is one of the core methods when positioning UI objects in Shoes. Obviously we should also do the hard work and try not to call that method that often, we’re working on that and already made some nice progress. But, to quote Jason, “regardless what we do I think thatDimension is bound to always be in our hot path.”. What about methods that do more though? Let’s take a look at an example where we have an object that has an array set as an instance variable and has a method that concatenates another array and sorts the result (

Source: https://gist.github.com/PragTob/c843d4f5aca6113d94e9

File: more_meta.rb

require 'benchmark/ips'

class Try

  def initialize(array)
    @array = array
  end

  define_method :meta_concat_sort do |array|
    value = instance_variable_get '@' + :array.to_s
    new_array = value + array
    new_array.sort
  end

  def concat_sort(array)
    new_array = @array + array
    new_array.sort
  end
end

BASE_ARRAY        = [8, 2, 400, -4, 77]
SMALL_INPUT_ARRAY = [1, 88, -7, 2, 133]
BIG_INPUT_ARRAY   = (1..100).to_a.shuffle


def do_benchmark(description, input)
  puts description
  Benchmark.ips do |b|
    try = Try.new BASE_ARRAY

    b.report('meta_concat_sort') { try.meta_concat_sort(input) }
    b.report('concat_sort')      { try.concat_sort(input) }
    b.compare!
  end
end

do_benchmark('Small input array', SMALL_INPUT_ARRAY)
do_benchmark('Big input array', BIG_INPUT_ARRAY)

 class Try def initialize(array) @array = array end define_method :meta_concat_sort do |array| value = instance_variable_get '@' + :array.to_s new_array = value + array new_array.sort end def concat_sort(array) new_array = @array + array new_array.sort end end 

We then benchmark those two methods with the same base array but two differently sized input arrays:

 BASE_ARRAY = [8, 2, 400, -4, 77] SMALL_INPUT_ARRAY = [1, 88, -7, 2, 133] BIG_INPUT_ARRAY = (1..100).to_a.shuffle 

What’s the result?

 Small input array Calculating ------------------------------------- meta_concat_sort 62.808k i/100ms concat_sort 86.143k i/100ms \------------------------------------------------- meta_concat_sort 869.940k (± 1.4%) i/s - 4.397M concat_sort 1.349M (± 2.6%) i/s - 6.805M Comparison: concat_sort: 1348894.9 i/s meta_concat_sort: 869940.1 i/s - 1.55x slower Big input array Calculating ------------------------------------- meta_concat_sort 18.872k i/100ms concat_sort 20.745k i/100ms \------------------------------------------------- meta_concat_sort 205.402k (± 2.7%) i/s - 1.038M concat_sort 231.637k (± 2.5%) i/s - 1.162M Comparison: concat_sort: 231636.7 i/s meta_concat_sort: 205402.2 i/s - 1.13x slower 

With the small input array the dynamically defined method is still over 50% slower than the non meta programmed method! When we have the big input array (100 elements) the meta programmed method is still 13% slower, which I still consider very significant. I ran these with CRuby 2.2.2, in case you are wondering if this is implementation specific. I ran the same benchmark with JRuby and got comparable results, albeit the fact that JRuby is usually 1.2 to 2 times faster than CRuby, but the slowdowns were about the same. So in the end, what does it mean? Always benchmark. Don’t blindly optimize calls like these as in the grand scheme of things they might not make a difference. This will only be really important for you if a method gets called a lot. If it is in your library/application, then replacing the meta programmed method definitions might yield surprising performance improvements. UPDATE 1: Shortly after this post was published coincidentally JRuby 9.0.0.3.0 was released with improvements to the call speed of methods defined by define_method. I added the

Source: https://gist.github.com/PragTob/17f7dcab51ad98a3f064

File: meta_benchmark.rb

require 'benchmark/ips'

class FakeDimension
  def initialize(margin_start)
    @margin_start = margin_start
    @margin_start_relative = relative? @margin_start
  end

  def relative?(result)
    result.is_a?(Float) && result <= 1
  end

  def calculate_relative(result)
    (result * 100).to_i
  end

  define_method :full_meta do
    instance_variable_name = '@' + :margin_start.to_s
    value = instance_variable_get(instance_variable_name)
    value = calculate_relative value if relative? value
    value
  end

  IVAR_NAME = "@margin_start"
  define_method :hoist_ivar_name do
    value = instance_variable_get(IVAR_NAME)
    value = calculate_relative value if relative? value
    value
  end

  define_method :direct_ivar do
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end

  eval <<-CODE
  def full_string
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
  CODE

  def full_direct
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
end

def do_benchmark(description, margin_start)
  puts description

  Benchmark.ips do |benchmark|
    dim = FakeDimension.new margin_start
    benchmark.report("full_meta")             { dim.full_meta }
    benchmark.report("hoist_ivar_name")       { dim.hoist_ivar_name }
    benchmark.report("direct_ivar")           { dim.direct_ivar }
    benchmark.report("full_string")           { dim.full_string }
    benchmark.report("full_direct")           { dim.full_direct }
    benchmark.compare!
  end
end

do_benchmark('Non relative margin start', 10)
do_benchmark('Relative margin start', 0.8)

. It is 7-15% faster for full_meta and hoist_ivar_name but now the direct_ivar is about as fast as its full_meta and full_string counterparts thanks to the optimizations! UPDATE 2: I wrote a

Source: https://gist.github.com/PragTob/fdeb46356702bf9943ca

File: instance_variabe_get.rb

require 'benchmark/ips'

class Test
  def intitialize
    @ivar = 'test'
  end

  def direct
    @ivar
  end

  IVAR_NAME = "@ivar"
  def hoist_ivar_name
    instance_variable_get IVAR_NAME
  end

  def instance_variable_get_symbol
    instance_variable_get :"@ivar"
  end

  def instance_variable_get_frozen
    instance_variable_get('@ivar'.freeze)
  end

  def instance_variable_get_frozen_symbol
    instance_variable_get('@ivar'.freeze.to_sym)
  end
end

Benchmark.ips do |benchmark|

  instance = Test.new
  benchmark.report("direct") {instance.direct}
  benchmark.report("hoist_ivar_name") {instance.hoist_ivar_name}
  benchmark.report("ivar get symbol") {instance.instance_variable_get_symbol}
  benchmark.report("ivar get frozen") {instance.instance_variable_get_frozen}
  benchmark.report("ivar get frozen sym") {instance.instance_variable_get_frozen_symbol}
end

about what I think is the bottle neck here - instance_variable_get. It is missing the slowest case but is still up to 3 times slower than the direct access.