Defect #6551

Highlighting in search results is case sensitive for cyrillic pattern

Added by Alexey Ivlev almost 7 years ago. Updated almost 4 years ago.

Status:NewStart date:2010-10-01
Priority:NormalDue date:
Assignee:-% Done:

50%

Category:Search engine
Target version:-
Resolution: Affected version:1.0.1

Description

I am sorry for my persistence, I have published the same problem on forum, but still have no response from there...

When I search any pattern in english everything works fine - highlighting in search results is case insensitive. If I try to search pattern in russian I have case insensitive search output, but highlighting my pattern in that results is case sensitive.

For example, if I try to search "ам" (all in lowercase) pattern I will see the next:

all lowercase symbols - all letter in search output are lowercase and highlighting work fine

one uppercase symbol - but if it is one letter or more is uppercase, highlighting doesn't appear.

In source code of search results page this tag <span class="highlight token-0">ам</span> exist for first image and does not exists for the second.

I use MySQL 5.1.41 database with utf8_general_ci encoding and apache + passenger on Ubuntu 10.04, rails-2.3.5, ruby 1.8.6. Please help me to remove this little issue. Thanks!

with_highlighting.png - all lowercase symbols (550 Bytes) Alexey Ivlev, 2010-10-01 09:57

without_highlighting.png - one uppercase symbol (623 Bytes) Alexey Ivlev, 2010-10-01 09:57


Related issues

Related to Redmine - Defect #10134: Case insensitive search is not working with postgres 8.4 ... Confirmed
Blocked by Redmine - Feature #4050: Ruby 1.9 support Closed 2009-10-18

History

#1 Updated by Etienne Massip over 6 years ago

  • Target version set to Candidate for next minor release

#2 Updated by Alexey Ivlev over 6 years ago

Thank you very much!

#3 Updated by Etienne Massip over 6 years ago

  • Target version deleted (Candidate for next minor release)

Sorry but the underlying issue seems to be a Ruby Regexp one as Redmine code in SearchHelper#highlight_tokens seems fairly safe in the way it handles case : source:trunk/app/helpers/search_helper.rb#L22.

Added #4050 as blocker.

#4 Updated by Alexey Ivlev over 6 years ago

In other words, the problem will be solved only when the Ruby Regexp will be fixed?

#5 Updated by Etienne Massip over 6 years ago

That's what I think, yes.

#6 Updated by Yuriy Sokolov almost 6 years ago

  • % Done changed from 0 to 50

Actually, I made a fix

module SearchHelper
  def highlight_tokens(text, tokens)
    return text unless text && tokens && !tokens.empty?
    re_tokens = tokens.collect {|t| Regexp.escape(t.mb_chars.downcase)}
    regexp = Regexp.new "(#{re_tokens.join('|')})" 
    result = ''
    position = 0
    text = text.mb_chars
    text.downcase.split(regexp).each_with_index do |words, i|
      if result.length > 1200
        # maximum length of the preview reached
        result << '...'
        break
      end
      words = text[position ... (position + words.size)]
      position += words.size
      if i.even?
        result << h(words.length > 100 ? "#{words.slice(0..44)} ... #{words.slice(-45..-1)}" : words)
      else
        t = (tokens.index(words.downcase) || 0) % 4
        result << content_tag('span', h(words), :class => "highlight token-#{t}")
      end
    end
    result
  end
end

#7 Updated by Jean-Philippe Lang almost 4 years ago

  • Subject changed from Highlighting in search results is case sensitive for cyrillic pattern to Highlighting in search results is case sensitive for cyrillic pattern

Also available in: Atom PDF