Project

General

Profile

Actions

Defect #43838

open

User mentions using the @login syntax cause performance issues when rendering content on instances with a large number of users.

Added by Marius BĂLTEANU 20 days ago. Updated 8 days ago.

Status:
New
Priority:
Normal
Category:
Performance
Target version:
Resolution:
Affected version:

Description

The performance issue was first reported by Jean-Philippe last year when he looked for solutions to optimize redmine.org website. One fix was to replace all the mentions made using @login syntax with user:#id syntax in the following news post.

In ApplicationHelper#parse_redmine_links, the lookup for user mentions is currently hardcoded to use a case-insensitive search: User.visible.find_by("LOWER(login) = :s AND type = 'User'", :s => name.downcase)

This causes two major issues on large instances (e.g., redmine.org with ~500,000 users):
1. Missing Index : The users table does not have an index on the login column.
2. Sequential Scans : The use of LOWER(login) prevents the database from using a standard index even if one were present, forcing a sequential table scan.

Benchmark Results (10000 Users):
I have used the attach file to test the performance.

1. The current trunk

root@f42152843160:/work# rails r bench_mentions.rb 
DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10
Inserting users...
.
Users inserted: 10000
Assigning sample users to project...

Benchmarking @login...
login_ms=3838.64 avg_login=383.86

Benchmarking user#id...
user_link_ms=3856.31 avg_user_link=385.63

Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=14433.17 avg_invalid_login=1443.32

Running explains:

mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`id` = 20008 AND `users`.`type` = 'User' LIMIT 1;
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys                                          | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | users | NULL       | const | PRIMARY,index_users_on_id_and_type,index_users_on_type | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND (LOWER(login) = 'bench_user_59' AND type = 'User') LIMIT 1;
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys       | key                 | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | users | NULL       | ref  | index_users_on_type | index_users_on_type | 1023    | const | 5046 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN  SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND (LOWER(login) = 'missing_user_100' AND type = 'User') LIMIT 1;
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys       | key                 | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | users | NULL       | ref  | index_users_on_type | index_users_on_type | 1023    | const | 5046 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

2. Add index to users.login, results the same:


root@f42152843160:/work# rails r bench_mentions.rb 
DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10
Inserting users...
.
Users inserted: 10000
Assigning sample users to project...

Benchmarking @login...
login_ms=4017.54 avg_login=401.75

Benchmarking user#id...
user_link_ms=3860.56 avg_user_link=386.06

Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=14659.24 avg_invalid_login=1465.92

3. Apply the attached patch


DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10
Inserting users...
.
Users inserted: 10000
Assigning sample users to project...

Benchmarking @login...
login_ms=3918.89 avg_login=391.89

Benchmarking user#id...
user_link_ms=3636.01 avg_user_link=363.6

Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=2576.47 avg_invalid_login=257.65
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`login` = 'missing_user_100' LIMIT 1;
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                            | key                  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | users | NULL       | ref  | index_users_on_type,index_users_on_login | index_users_on_login | 1022    | const |    1 |     5.00 | Using where |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`login` = 'bench_user_92' LIMIT 1;
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                            | key                  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | users | NULL       | ref  | index_users_on_type,index_users_on_login | index_users_on_login | 1022    | const |    1 |     5.00 | Using where |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Running the bench test with 500000 users will just increase the differences between looking by id or by login.

The proposed changes:
  • fix the issues only for the mention feature, method User.find_by_login(username) will still fail over to case-insensitive if none was found which will not use any index.
  • breaks mentioning feature for those instances where the username is not lowercase.

Another solution is to add an index for LOWER(login), but I didn't run a test yet.


Files


Related issues

Related to Redmine - Feature #13919: Mention user on issues and wiki pages using @user with autocompleteClosedMarius BĂLTEANUActions
Related to Redmine - Feature #4179: Link to user in wiki syntaxClosedJean-Philippe Lang2009-11-08Actions
Actions #2

Updated by Marius BĂLTEANU 20 days ago

  • Description updated (diff)
Actions #3

Updated by Marius BĂLTEANU 20 days ago

  • Description updated (diff)
Actions #4

Updated by Holger Just 17 days ago

For MySQL installations, functionally it should not matter if you explicitly call LOWER() or not since it uses a case-insentitive collation by default.

Still, why couldn't we just use User.visible.find_by_login(name) for the mentions? This method is also used during login and handles case-differences with an explicit fallback. This should be fast(er) for the happy path, but still suffer from long queries for the case-insensitive or failed-lookup cases (but retain complete feature parity). If we improve the performance here, it should help the entire system.

For correctness, it may still be worthwhile to add a UNIQUE index for LOWER(login) to ensure that even with races, there can not be users with duplicated logins (User.find_by_login handles this today with an implicit order by id). With such an index, we need to handle groups and internal users without a login though.

Actions #5

Updated by Marius BĂLTEANU 16 days ago

  • Related to Feature #13919: Mention user on issues and wiki pages using @user with autocomplete added
Actions #6

Updated by Marius BĂLTEANU 16 days ago

Actions #7

Updated by Marius BĂLTEANU 16 days ago

Thanks Holger.

I don't remember why I didn't use User.visible.find_by_login(name) from the beginning (#4179), but when I worked at this performance issue reported here, I tried the method, but it requires an additional index on lower(login) in order to behave performantly in all cases.

Do you see any issue if we add both indexes as in the proposed patch? From what I read, functional indexes are available starting MySQL 8, should we guard this in the migration for older installations? Right now, we don't have anything to require a minimum version of the db.

With both indexes applied, the bench with 500000 users returns the following:

root@f42152843160:/work# rails r bench_mentions.rb 
DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=500000 mentions_per_text=100 iter=10
Benchmarking @login...
login_ms=3701.41 avg_login=370.14

Benchmarking user#id...
user_link_ms=3786.34 avg_user_link=378.63

Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=3697.16 avg_invalid_login=369.72
root@f42152843160:/work# 
Actions #8

Updated by Marius BĂLTEANU 16 days ago

On a longer term, maybe it is better to migrate all the logins to lowercase and use only one query to retrieve the user, but we cannot do it now because it will break the LDAP logins with are case sensitive? At least this is what I understood from the old issues and commits.

Actions #9

Updated by Holger Just 16 days ago

I think we could guard this based on database used. For MySQL, the collation is generally case-insensitive anyway, so WHERE comparisons should return data in any case. SQLite is special in other ways...

As such, for MySQL we don't need a separate index. For PostgreSQL, we can use an lower index. For SQLite, there is a special NOCASE flag for indexes (but I don't know how this behaves with columns who are case-sensitive by default). Another option for SQLite would by to use a regular LIKE query which is case-insensitive for ASCII characters. Here, we would have to ensure to escape any "special" placeholders though.

With a special case for MySQL, this could look like this if we add a matching index for databases other than MySQL (but I have not tested or benchmarked this).

def self.find_by_login(login)
  login = Redmine::CodesetUtil.replace_invalid_utf8(login.to_s)
  if login.present?
    users = where(:login => login)

    # First look for an exact match
    user = users.detect { |u| u.login == login }

    # Fail over to case-insensitive search if none was found
    unless user
      if Redmine::Database.mysql?
        # MySQL is case-insensitive by default, we can search in the existing results
        user = users.detect { |u| u.login.casecmp?(login) }
      else
        user = find_by("LOWER(login) = ?", login.downcase)
      end
    end
    user
  end
end
Actions #10

Updated by Holger Just 16 days ago

Marius BĂLTEANU wrote in #note-8:

On a longer term, maybe it is better to migrate all the logins to lowercase and use only one query to retrieve the user, but we cannot do it now because it will break the LDAP logins with are case sensitive? At least this is what I understood from the old issues and commits.

That would probably be a good idea. We would also have to deal with regular existing "duplicated" logins during this migration though (i.e. logins which only differ by case). Also, as we do not have a unique database index now, it's possible for multiple users to have the exact same login due to races. As mentioned, User.find_by_login solves this consistently due to the implicit order by id. When changing the data model, we still have to handle this, e.g. if just by offering a tool to fix this (by showing and optionally renaming conflicting logins)

My preferred option in the long term would be to allow user-settable logins with any case but to ensure that we only allow unique case-insensitive logins, i.e. case-preserving logins, which in the end could be enforced with a unique lower(login) index after we have solved all the blockers :)

Actions #11

Updated by Marius BĂLTEANU 14 days ago

Thanks Holger, it became clearer now for me. I try to summarize the changes to fix the performance issue on a short term:

1. The first patch (0001-Look-for-mentions-by-login-using-the-existing-find_b.patch) changes application_helper.rb to look up for mentions by login using the existing find_by_login in order to have a consistent behaviour between login and user mentions.

2. The second patch (0002-Add-an-index-to-users-login-to-speed-up-find-user-by.patch) brings two changes:
  • Add an index to users login to speed up find user by login on all databases
  • For database which are case-insensitive by default, it search in the existing results instead of firing up a new query using lower(login).

Having both patches applied, we have fixed the performance issue when looking by exact login on all databases, and on MySQL and SQLServer in all cases because both databases are case-insensitive by default. Bench results confirms this

3. The third patch (0003-Adds-index-on-users-lower-login-on-PostgreSQL-and-SQ.patch ) adds an index on lower(login) only on PostSQL and SQLite databases and it should fix the performance issue on non exact or non existing matches. I didn't run yet the bench script against these databases, but I will do it before committing these changes, I only confirmed the migration pass on both.

Please let me know if I missed anything.

Actions #12

Updated by Marius BĂLTEANU 14 days ago

  • File deleted (0002-Moves-wikiImageMimeTypes-to-meta-tag.patch)
Actions #14

Updated by Marius BĂLTEANU 8 days ago

  • Target version changed from Candidate for next minor release to 7.0.0

Because database migrations are required, this issue will be fixed only in 7.0.0.

Actions

Also available in: Atom PDF