Patch #25861

CSV Importer - handle UndefinedConversionErrors

Added by Jens Krämer 3 months ago. Updated 3 months ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:Jean-Philippe Lang% Done:

0%

Category:Importers
Target version:3.2.7

Description

The CSV import already handles a couple of Exceptions that may happen when the user-selected encoding does not match the real encoding of the CSV (or it contains otherwise illegal byte sequences). We recently encountered Encoding::UndefinedConversionError, which is not yet caught by that error handling. The attached patches add a test case and a fix for that.

0001-adds-failing-test-case-for-import-with-wrong-encodin.patch Magnifier - test case (1.69 KB) Jens Krämer, 2017-05-16 11:06

0002-adds-rescue-from-Encoding-UndefinedConversionError.patch Magnifier - fix (1.05 KB) Jens Krämer, 2017-05-16 11:06

Associated revisions

Revision 16572
Added by Jean-Philippe Lang 3 months ago

Rescue any EncodingError exceptions (#25861).

History

#1 Updated by Go MAEDA 3 months ago

  • Target version set to 3.2.7

We can reproduce the problem with the test in 0001-adds-failing-test-case-for-import-with-wrong-encodin.patch. Setting target version to 3.2.7.

By the way, the encoding of the file invalid-Shift_JIS.csv is valid CP932. Shift_JIS and CP932 are almost identical, but CP932 includes more characters.

$ iconv -f cp932 -t utf8 test/fixtures/files/invalid-Shift_JIS.csv
①

#2 Updated by Jean-Philippe Lang 3 months ago

  • Status changed from New to Needs feedback

Go MAEDA wrote:

We can reproduce the problem with the test in 0001-adds-failing-test-case-for-import-with-wrong-encodin.patch. Setting target version to 3.2.7.

The test passes for me with ruby 2.0 (I would get a Encoding::InvalidByteSequenceError). Maybe we could simply rescue EncodingError which is the base class for encoding errors?

#3 Updated by Go MAEDA 3 months ago

Jean-Philippe Lang wrote:

Maybe we could simply rescue EncodingError which is the base class for encoding errors?

I agree, caching EncodingError is better. We can simplify the code and catch all errors caused by iconv.

#4 Updated by Jean-Philippe Lang 3 months ago

  • Status changed from Needs feedback to Resolved
  • Assignee set to Jean-Philippe Lang

#5 Updated by Jean-Philippe Lang 3 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF