Defect #43838
openUser mentions using the @login syntax cause performance issues when rendering content on instances with a large number of users.
Description
The performance issue was first reported by Jean-Philippe last year when he looked for solutions to optimize redmine.org website. One fix was to replace all the mentions made using @login syntax with user:#id syntax in the following news post.
In ApplicationHelper#parse_redmine_links, the lookup for user mentions is currently hardcoded to use a case-insensitive search: User.visible.find_by("LOWER(login) = :s AND type = 'User'", :s => name.downcase)
This causes two major issues on large instances (e.g., redmine.org with ~500,000 users):
1. Missing Index : The users table does not have an index on the login column.
2. Sequential Scans : The use of LOWER(login) prevents the database from using a standard index even if one were present, forcing a sequential table scan.
Benchmark Results (10000 Users):
I have used the attach file to test the performance.
1. The current trunk
root@f42152843160:/work# rails r bench_mentions.rb DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16) ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10 Inserting users... . Users inserted: 10000 Assigning sample users to project... Benchmarking @login... login_ms=3838.64 avg_login=383.86 Benchmarking user#id... user_link_ms=3856.31 avg_user_link=385.63 Benchmarking @login (worst-case non-existent, forces full scan)... invalid_login_ms=14433.17 avg_invalid_login=1443.32
Running explains:
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`id` = 20008 AND `users`.`type` = 'User' LIMIT 1;
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | users | NULL | const | PRIMARY,index_users_on_id_and_type,index_users_on_type | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+--------------------------------------------------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND (LOWER(login) = 'bench_user_59' AND type = 'User') LIMIT 1;
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | users | NULL | ref | index_users_on_type | index_users_on_type | 1023 | const | 5046 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND (LOWER(login) = 'missing_user_100' AND type = 'User') LIMIT 1;
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | users | NULL | ref | index_users_on_type | index_users_on_type | 1023 | const | 5046 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------------+---------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
2. Add index to users.login, results the same:
root@f42152843160:/work# rails r bench_mentions.rb
DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10
Inserting users...
.
Users inserted: 10000
Assigning sample users to project...
Benchmarking @login...
login_ms=4017.54 avg_login=401.75
Benchmarking user#id...
user_link_ms=3860.56 avg_user_link=386.06
Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=14659.24 avg_invalid_login=1465.92
3. Apply the attached patch
DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16)
ruby=3.4.8 db=Mysql2 users=10000 mentions_per_text=100 iter=10
Inserting users...
.
Users inserted: 10000
Assigning sample users to project...
Benchmarking @login...
login_ms=3918.89 avg_login=391.89
Benchmarking user#id...
user_link_ms=3636.01 avg_user_link=363.6
Benchmarking @login (worst-case non-existent, forces full scan)...
invalid_login_ms=2576.47 avg_invalid_login=257.65
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`login` = 'missing_user_100' LIMIT 1;
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | users | NULL | ref | index_users_on_type,index_users_on_login | index_users_on_login | 1022 | const | 1 | 5.00 | Using where |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('User', 'AnonymousUser') AND `users`.`status` = 1 AND `users`.`login` = 'bench_user_92' LIMIT 1;
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | users | NULL | ref | index_users_on_type,index_users_on_login | index_users_on_login | 1022 | const | 1 | 5.00 | Using where |
+----+-------------+-------+------------+------+------------------------------------------+----------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Running the bench test with 500000 users will just increase the differences between looking by id or by login.
The proposed changes:- fix the issues only for the mention feature, method
User.find_by_login(username)will still fail over to case-insensitive if none was found which will not use any index. - breaks mentioning feature for those instances where the username is not lowercase.
Another solution is to add an index for LOWER(login), but I didn't run a test yet.
Files
Related issues
Updated by Marius BĂLTEANU 20 days ago
Updated by Holger Just 17 days ago
For MySQL installations, functionally it should not matter if you explicitly call LOWER() or not since it uses a case-insentitive collation by default.
Still, why couldn't we just use User.visible.find_by_login(name) for the mentions? This method is also used during login and handles case-differences with an explicit fallback. This should be fast(er) for the happy path, but still suffer from long queries for the case-insensitive or failed-lookup cases (but retain complete feature parity). If we improve the performance here, it should help the entire system.
For correctness, it may still be worthwhile to add a UNIQUE index for LOWER(login) to ensure that even with races, there can not be users with duplicated logins (User.find_by_login handles this today with an implicit order by id). With such an index, we need to handle groups and internal users without a login though.
Updated by Marius BĂLTEANU 16 days ago
- Related to Feature #13919: Mention user on issues and wiki pages using @user with autocomplete added
Updated by Marius BĂLTEANU 16 days ago
- Related to Feature #4179: Link to user in wiki syntax added
Updated by Marius BĂLTEANU 16 days ago
- File 0001-Adds-index-on-user-login-and-lower-login.patch 0001-Adds-index-on-user-login-and-lower-login.patch added
- Assignee set to Marius BĂLTEANU
Thanks Holger.
I don't remember why I didn't use User.visible.find_by_login(name) from the beginning (#4179), but when I worked at this performance issue reported here, I tried the method, but it requires an additional index on lower(login) in order to behave performantly in all cases.
Do you see any issue if we add both indexes as in the proposed patch? From what I read, functional indexes are available starting MySQL 8, should we guard this in the migration for older installations? Right now, we don't have anything to require a minimum version of the db.
With both indexes applied, the bench with 500000 users returns the following:
root@f42152843160:/work# rails r bench_mentions.rb DEPRECATION WARNING: `config.active_support.to_time_preserves_timezone` is deprecated and will be removed in Rails 8.2 (called from <top (required)> at /work/config/environment.rb:16) ruby=3.4.8 db=Mysql2 users=500000 mentions_per_text=100 iter=10 Benchmarking @login... login_ms=3701.41 avg_login=370.14 Benchmarking user#id... user_link_ms=3786.34 avg_user_link=378.63 Benchmarking @login (worst-case non-existent, forces full scan)... invalid_login_ms=3697.16 avg_invalid_login=369.72 root@f42152843160:/work#
Updated by Marius BĂLTEANU 16 days ago
On a longer term, maybe it is better to migrate all the logins to lowercase and use only one query to retrieve the user, but we cannot do it now because it will break the LDAP logins with are case sensitive? At least this is what I understood from the old issues and commits.
Updated by Holger Just 16 days ago
I think we could guard this based on database used. For MySQL, the collation is generally case-insensitive anyway, so WHERE comparisons should return data in any case. SQLite is special in other ways...
As such, for MySQL we don't need a separate index. For PostgreSQL, we can use an lower index. For SQLite, there is a special NOCASE flag for indexes (but I don't know how this behaves with columns who are case-sensitive by default). Another option for SQLite would by to use a regular LIKE query which is case-insensitive for ASCII characters. Here, we would have to ensure to escape any "special" placeholders though.
With a special case for MySQL, this could look like this if we add a matching index for databases other than MySQL (but I have not tested or benchmarked this).
def self.find_by_login(login)
login = Redmine::CodesetUtil.replace_invalid_utf8(login.to_s)
if login.present?
users = where(:login => login)
# First look for an exact match
user = users.detect { |u| u.login == login }
# Fail over to case-insensitive search if none was found
unless user
if Redmine::Database.mysql?
# MySQL is case-insensitive by default, we can search in the existing results
user = users.detect { |u| u.login.casecmp?(login) }
else
user = find_by("LOWER(login) = ?", login.downcase)
end
end
user
end
end
Updated by Holger Just 16 days ago
Marius BĂLTEANU wrote in #note-8:
On a longer term, maybe it is better to migrate all the logins to lowercase and use only one query to retrieve the user, but we cannot do it now because it will break the LDAP logins with are case sensitive? At least this is what I understood from the old issues and commits.
That would probably be a good idea. We would also have to deal with regular existing "duplicated" logins during this migration though (i.e. logins which only differ by case). Also, as we do not have a unique database index now, it's possible for multiple users to have the exact same login due to races. As mentioned, User.find_by_login solves this consistently due to the implicit order by id. When changing the data model, we still have to handle this, e.g. if just by offering a tool to fix this (by showing and optionally renaming conflicting logins)
My preferred option in the long term would be to allow user-settable logins with any case but to ensure that we only allow unique case-insensitive logins, i.e. case-preserving logins, which in the end could be enforced with a unique lower(login) index after we have solved all the blockers :)
Updated by Marius BĂLTEANU 14 days ago
- File 0001-Look-for-mentions-by-login-using-the-existing-find_b.patch 0001-Look-for-mentions-by-login-using-the-existing-find_b.patch added
- File 0002-Moves-wikiImageMimeTypes-to-meta-tag.patch added
- File 0003-Adds-index-on-users-lower-login-on-PostgreSQL-and-SQ.patch 0003-Adds-index-on-users-lower-login-on-PostgreSQL-and-SQ.patch added
Thanks Holger, it became clearer now for me. I try to summarize the changes to fix the performance issue on a short term:
1. The first patch (0001-Look-for-mentions-by-login-using-the-existing-find_b.patch) changes application_helper.rb to look up for mentions by login using the existing find_by_login in order to have a consistent behaviour between login and user mentions.
- Add an index to users login to speed up find user by login on all databases
- For database which are case-insensitive by default, it search in the existing results instead of firing up a new query using
lower(login).
Having both patches applied, we have fixed the performance issue when looking by exact login on all databases, and on MySQL and SQLServer in all cases because both databases are case-insensitive by default. Bench results confirms this
3. The third patch (0003-Adds-index-on-users-lower-login-on-PostgreSQL-and-SQ.patch ) adds an index on lower(login) only on PostSQL and SQLite databases and it should fix the performance issue on non exact or non existing matches. I didn't run yet the bench script against these databases, but I will do it before committing these changes, I only confirmed the migration pass on both.
Please let me know if I missed anything.
Updated by Marius BĂLTEANU 14 days ago
- File deleted (
0002-Moves-wikiImageMimeTypes-to-meta-tag.patch)
Updated by Marius BĂLTEANU 8 days ago
- Target version changed from Candidate for next minor release to 7.0.0
Because database migrations are required, this issue will be fixed only in 7.0.0.