Avoding "Invalid byte sequence in UTF-8" with Ruby and CSV files

If you’re running into a ton of problems reading e.g. an ISO-8859-1 encoded CSV file into your (probably UTF-8) Ruby or Rails application, and if the error you get is “Invalid byte sequence in UTF-8” even though you’re giving CSV.open the correct encoding options, here’s a solution.

The example CSV file is a tab-separated, ISO-8859-1 encoded file with CRLF line endings. You’d expect the following to work:

CSV.open(@infile, "r:ISO-8859-15:UTF-8", {:col_sep =&gt; "t", :headers =&gt; :first_row})

But it fails mysteriously! Even though the conversion to UTF-8 goes without problems, you get an ArgumentError complaining about some illegal byte sequence. If you analyze deeper, you might find (in this case) a complaint about rn. The solution is very, very non-obvious: You need to specify the row separator in addition to your encodings!

mjtko from the #rubyonrails channel on Freenode discovered this. If we change the line to the following:

CSV.open(@infile, "r:ISO-8859-15:UTF-8", {:col_sep =&gt; "t", :row_sep =&gt; "\n", :headers =&gt; :first_row})

Boom, there’s your working CSV object, with working encodings.