Know your closures: blocks, procs, and lambdas →
Monday, 7 January 2013
While Ruby might not be a functional programming language in the sense that Haskell or Clojure are, it does support a lot of the paradigms for which functional languages are known. One important characteristic of functional languages is their support of first-class functions:
In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. Specifically, this means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures.
Ruby methods themselves aren't first-class functions, as evidenced by the following:
def method_a
def method_b
puts "I'm method b"
end
end
x = method_a
x() # => NoMethodError: undefined method `x' for main:Object
If methods were first-class functions, this would work — method_a would return method_b, which would then execute when x() was called. This doesn't work because Ruby methods aren't objects.
However, Ruby provides three different ways to define functions in ways that do behave as first-class functions (that is, they can be created dynamically, passed to and returned from other functions, and assigned to variables), giving you virtually all of the power of languages with 'proper' first-class function support. These three — blocks, procs, and lambdas — are all closures:
In computer science, a closure (also lexical closure or function closure) is a function or reference to a function together with a referencing environment — a table storing a reference to each of the non-local variables (also called free variables) of that function. A closure — unlike a plain function pointer — allows a function to access those non-local variables even when invoked outside of its immediate lexical scope.
We'll examine each of these seperately, and then step back and look at how they're implemented under the hood.
Blocks
Blocks are the most Ruby-like way to use closures. You see them most frequently with enumerators, where a function is applied to a series of values one at a time, and in wrapper functions where you want to do something, the execute the block, then do something else (e.g., transactions, opening and closing file handles, etc.).
Here's a trivial example of the first, where the range (1..10) is iterated over with each, which passes each value to the block in turn. The block here is just a function that takes a single argument x and prints that argument to the screen.
(1..10).each do |x|
puts x
end
An example of the latter use, where a function wraps another function that is passed in as a block, can be seen with file handles.
contents = File.open('/path/to/file', 'r') do |file|
file.read
end
If you want to write a function that takes a block, you use the yield keyword. Here's a simple function that wraps around a passed block and prints how long the block took to execute, then returns the value returned by the block.
def time
start = Time.now
result = yield
puts "Completed in #{Time.now - start} seconds."
result
end
time do
sleep 2
end
# => Completed in 2.001109 seconds.
A couple of things to notice:
- The
yieldkeyword returns the block's return value. - The block isn't indicated in the function's arguments (it can be, though, and we'll look at when it is a bit later).
In this case no values were yielded into the block. Frequently, though, blocks a yielded one or more values that it then uses to do whatever it is that it does. For instance, each yields a single value into the block, which is bound to a variable at the beginning of the block (the do |x| bit). Here's a simple function that takes an argument, and then yields that argument into a block.
def contrived_example(n)
yield n
end
contrived_example(3) do |x|
x + 3
end
# => 6
As you can see the argument n is yielded into the block, where three is added to it, resulting in six.
Procs
Blocks are neat, but they're not the only game in town. What if you want to write a function once, and then pass it to lots of different functions. Because blocks are defined in place, using the same block in different locations forces you to repeat yourself. What you need is an object that encapsulates the function that can then be passed repeatedly to different functions.
From the Proc documentation:
Proc objects are blocks of code that have been bound to a set of local variables. Once bound, the code may be called in different contexts and still access those variables.
When you need to store a block of code and pass it around, use a Proc. Here's an example that shows how the binding works.
def proc_maker
local_var = " is great!"
return Proc.new { |s| puts s + local_var }
end
p = proc_maker
p.call "Ruby"
# => "Ruby is great!"
Here the method proc_maker returns a new Proc object that prints the argument it is passed concatenated with the value of the local variable local_var. Later, when the returned proc is called, despite being outside of local_var's scope, everything works as expected. That's because the binding occurs when the proc is defined, so when it is later called it still knows the value of local_var.
Notice, too, that the proc is actually executed with the call method, which is passed arguments that are then passed to the proc.
Why use a proc instead of a block? One, it's more explicit — because the proc must be included in the passed arguments, it's obvious that the called method is receiving and doing something with a proc, whereas you have to search through the method's definition to find a yield statement before you can be sure that it takes a block. Secondly, you can pass more than one proc to a method, where as you can only have a single block. This is useful for callbacks — for example, a method that downloads a file and passes it to one proc when the download is finished, or passes an error to another block if the download fails.
Lambdas
lambda is a function defined in the Kernel object, which makes it globally available. It is also used to create closures, in very much like the Proc object. In fact, here's the above example rewritten for lambda:
def lambda_maker
local_var = " is great!"
return lambda { |s| puts s + local_var }
end
p = lambda_maker
p.call "Ruby"
# => "Ruby is great!"
lambda and Proc are almost identical, though not entirely, as will be discussed below.
Blocks, Lambdas = Procs?!
It might seem silly to have three different ways of doing virtually the same thing, and it would be, except there aren't. Under the hood, blocks, lambdas, and procs are all the same — namely, instances of the Proc class. Proving this to be true for lambdas is easy:
lambda { |x| x }.class
# => Proc
Showing that it's true for blocks is a bit more involved. Remember above when I said that “[t]he block isn't indicated in the function's arguments (it can be, though, and we'll look at when it is a bit later)”? Well, now is the time to look. Here's the exemplary method:
def blocky(&block)
return block.class
end
blocky do
# epic blocky things
end
# => Proc
When the last argument in a method's definition starts with an ampersand, it gives a name to the block that can then be referenced within the method. Note that this still isn't a real argument — you can't explicitly pass in a Proc to it.
blocky(lambda {})
# => ArgumentError: wrong number of arguments (1 for 0)
When you give a name to passed block, though, you can treat it as both a Proc object (invoking it will call) or as usual with yield.
# these two methods are the same
def blocky_yield(&block)
yield
end
def blocky_call(&block)
block.call
end
In effect, the yield keyword allows you to execute an otherwise impossible to reference Proc that is passed into the method as a block. If you bind the Proc to a name with the ampersand, then you can reference it directly. It should be noted, though, that using yield in this case is significantly faster than using call.
Differences
We said before that blocks, procs, and lambdas are almost identical, but not quite. The differences are actually between procs and lambdas — blocks and procs are indeed identical. There are two differences between procs and lambdas:
- Lambdas check the number of arguments passed to them, whereas procs do not.
- Lambdas
returndifferently than do procs.
The first one is straightforward. Here's an example:
p = Proc.new { |x, y| puts "x: #{x.class} y: #{y.class}" }
p.call 2
# => x: Fixnum y: NilClass
l = lambda { |x, y| puts "x: #{x.class} y: #{y.class}" }
l.call 2
# => ArgumentError: wrong number of arguments (1 for 2)
I honestly have no idea why this is the case, but it's something to remember.
The other difference is a bit harder to explain, but it leads to some interesting behavior. When a return is used inside a proc, it does something called a non-local return — that is, instead of returning from the code executing inside the proc, it returns from its calling context. Here's an example that shows how the behaviors are different.
def return_proc
p = Proc.new { return "Now you see me" }
p.call
return "Now you don't!"
end
return_proc
# => "Now you see me"
def return_lambda
l = lambda { return "Now you see me" }
l.call
return "Now you don't!"
end
return_lambda
# => "Now you don't!"
The reason for this is that lambdas return to context in which they are executing, whereas procs return from the context in which they were defined (blocks do too, since they're just procs underneath).
Making procs out of methods, and blocks out of procs
If blocks are just syntactic sugar for procs, then what happens if you already have a proc that you'd like to use and want to pass it to a method that expects a block? Reenter our good friend, the ampersand, from stage left.
def ampersandy(n)
yield n
end
ampersandy(10) do |x|
x + 10
end
# => 20
l = lambda { |x| x + 10 }
ampersandy(10, &l)
# => 20
Here, the ampersand is prefixed to the lambda, and converts it into a block for the ampersandy method. Of course the traditional block syntax can still be used.
While this is useful, there is another much more interesting context in which the ampersand operator often comes up. As we said near the beginning of this article, Ruby methods are not objects, and thus can't be passed directly to other methods as arguments. However, the ampersand operator lets us take a method and convert it into a Proc object, which can then be used like any other proc.
The usage is similar, but for already-defined methods it works via the Symbol class's to_proc method. In Ruby, symbols (which are prefixed with colons) can be used to reference the name of a method. For instance, if for some reason you wanted a convoluted way to call a method on an object, you could use the send method (which is defined on the Object class and is thus available to every object) and pass it a symbol corresponding to the method you want called, and it will be called.
10.send(:to_s).class
# => String
# this is the same as
10.to_s.class
# => String
Once you know that a symbol can be used to reference a method, it makes sense that the ampersand operator can be used to convert a symbol to a Proc. The ampersand expects a Proc object in the first place, and when it finds something that's not a Proc it calls to_proc on that thing to try to convert it to a Proc object. Since the Symbol class has to_proc defined, the result is a Proc object that can then be used like any other — including being passed in place of a block.
Here we use the ampersand operator to send the addition function to inject (which folds over a collection) to sum up a range of numbers.
(1..100).inject(&:+)
# => 5050
# which is the same as
(1..100).inject do |acc, x|
acc += x
end
# => 5050
You can also call to_proc yourself to get a Proc object to work with directly. Here's an example with the exponentiation method **:
:**.to_proc.call(2,5) # 2^5
# => 32
Wrapping up
I hope this has been a useful introduction to closures in Ruby. I plan on writing a lot more about functional programming topics in Ruby, and being able to define functions as objects and pass them around is obviously central to this. It also opens up new ways of solving problems, and is useful even if you never step outside the comfortable confines of Ruby's object-oriented, imperative shell.