New liberal_parsing option for parsing bad CSV data

Ershad Kunnakkadan

Ershad Kunnakkadan

November 22, 2016

This blog is part of our  Ruby 2.4 series.

Comma-Separated Values (CSV) is a widely used data format and almost every language has a module to parse it. In Ruby, we have CSV class to do that.

According to RFC 4180, we cannot have unescaped double quotes in CSV input since such data can't be parsed.

We get MalformedCSVError error when the CSV data does not conform to RFC 4180.

Ruby 2.4 has added a liberal parsing option to parse such bad data. When it is set to true, Ruby will try to parse the data even when the data does not conform to RFC 4180.

1
2# Before Ruby 2.4
3
4> CSV.parse_line('one,two",three,four')
5
6CSV::MalformedCSVError: Illegal quoting in line 1.
7
8
9# With Ruby 2.4
10
11> CSV.parse_line('one,two",three,four', liberal_parsing: true)
12
13=> ["one", "two\"", "three", "four"]
14

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.