July 17, 2018
This blog is part of our Ruby 2.6 series.
Before Ruby 2.6, String#split returned array of split strings.
In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.
We will add method is_fruit? to understand how to use split with a block.
def is_fruit?(value)
%w(apple mango banana watermelon grapes guava lychee).include?(value)
end
Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.
input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
splitted_values = input_str.split(", ")
=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]
fruits = splitted_values.select { |value| is_fruit?(value) }
=> ["apple", "mango", "banana", "watermelon", "grapes"]
Using split an intermediate array is created which contains both fruits and
vegetables names.
fruits = []
input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
=> "apple, mango, potato, banana, cabbage, watermelon, grapes"
fruits
=> ["apple", "mango", "banana", "watermelon", "grapes"]
When a block is passed to split, it returns the string on which split was
called and does not create an array. String#split yields block on each split
string, which in our case was to push fruit names in a separate array.
We created a large random string to benchmark performance of split and
split with block
require 'securerandom'
test_string = ''
100_000.times.each do
test_string += SecureRandom.alphanumeric(10)
test_string += ' '
end
require 'benchmark'
Benchmark.bmbm do |bench|
bench.report('split') do
arr = test_string.split(' ')
str_starts_with_a = arr.select { |str| str.start_with?('a') }
end
bench.report('split with block') do
str_starts_with_a = []
test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
end
end
Results
Rehearsal ----------------------------------------------------
split 0.023764 0.000911 0.024675 ( 0.024686)
split with block 0.012892 0.000553 0.013445 ( 0.013486)
------------------------------------------- total: 0.038120sec
user system total real
split 0.024107 0.000487 0.024594 ( 0.024622)
split with block 0.010613 0.000334 0.010947 ( 0.010991)
We did another iteration of benchmarking using benchmark/ips.
require 'benchmark/ips'
Benchmark.ips do |bench|
bench.report('split') do
splitted_arr = test_string.split(' ')
str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
end
bench.report('split with block') do
str_starts_with_a = []
test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
end
bench.compare!
end
Results
Warming up --------------------------------------
split 4.000 i/100ms
split with block 10.000 i/100ms
Calculating -------------------------------------
split 46.906 (± 2.1%) i/s - 236.000 in 5.033343s
split with block 107.301 (± 1.9%) i/s - 540.000 in 5.033614s
Comparison:
split with block: 107.3 i/s
split: 46.9 i/s - 2.29x slower
This benchmark shows that split with block is about 2 times faster than
split.
Here is relevant commit and discussion for this change.
The Chinese version of this blog is available here.
If this blog was helpful, check out our full blog archive.