Project

General

Profile

Actions

Defect #40020

closed

ScmData.binary? incorrectly considers UTF-8 text as binary

Added by Go MAEDA 4 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
SCM
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed
Affected version:

Description

Currently, the binary? method in Redmine::Scm::Adapters::ScmData often misclassifies Unicode text as binary. This is because the method actually checks whether the given data is ASCII text or not.

The new implementation in the attached patch checks for control characters excluding tabs, newlines, and carriage returns, and calculates their proportion in the data. It ensures accurate detection of binary data while properly handling Unicode text.


Files

Actions #1

Updated by Go MAEDA 4 months ago

I have lowered the percentage of control characters in the data that the method considers to be binary from 0.3 to 0.1.

Since the number of control characters is approximately 11% of the 256 values that can be expressed in a single byte, I believe that a threshold value of 0.1 is enough.

Actions #2

Updated by Go MAEDA 3 months ago

  • Target version set to Candidate for next major release
Actions #3

Updated by Go MAEDA 3 months ago

  • Target version changed from Candidate for next major release to 6.0.0

Setting the target version to 6.0.0.

Actions #4

Updated by Go MAEDA 3 months ago

  • Subject changed from ScmData.binary? incorrectly identifies UTF-8 data as binary data to ScmData.binary? incorrectly considers UTF-8 text as binary
  • Status changed from New to Closed
  • Assignee set to Go MAEDA
  • Resolution set to Fixed

Committed the fix in r22664.

Actions

Also available in: Atom PDF