Defect #20730: Fix tokenization of phrases with non-ascii chars - Redmine

Actions

Copy link

Defect #20730

closed

Fix tokenization of phrases with non-ascii chars

Added by Jens Krämer almost 10 years ago. Updated almost 10 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Jean-Philippe Lang

Category:

Search engine

Target version:

3.0.6

Start date:

Due date:

% Done:

Estimated time:

Resolution:

Fixed

Affected version:

Description

\w only matches ASCII characters, we should either use [:alnum:] instead or simply match all non-" characters for the phrase. Test case included.

Files

fix-tokenization-for-phrases-with-non-ascii-characte.patch (1.39 KB) fix-tokenization-for-phrases-with-non-ascii-characte.patch

Jens Krämer, 2015-09-13 05:53

Actions

Copy link

Updated by Go MAEDA almost 10 years ago

Tracker changed from Patch to Defect
Target version set to 3.1.2

Search keyword '"日本語テスト"' (written in Japanese) matches both "日本語テスト" and "日本語テスト" in the current trunk, but it should not match the latter.

expected:

Redmine::Search::Fetcher.new('"日本語 テスト"', ...).tokens => ['日本語 テスト']

actual:

Redmine::Search::Fetcher.new('"日本語 テスト"', ...).tokens => ['日本語', 'テスト']

This behavior can be fixed by this patch.

Actions

Copy link

Updated by Jean-Philippe Lang almost 10 years ago

Status changed from New to Closed
Assignee set to Jean-Philippe Lang
Target version changed from 3.1.2 to 3.0.6
Resolution set to Fixed

Patch applied, thanks.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Redmine

Custom queries

Defect #20730

Fix tokenization of phrases with non-ascii chars

Updated by Go MAEDA almost 10 years ago

Updated by Jean-Philippe Lang almost 10 years ago