Where to start with performance analysis?

Added by Michael Vance over 10 years ago

We have a small Redmine installation that we're testing which seems very, very slow. Some info:

- Based off a version of trunk from 8 Apr because of some issues with repo support in older versions.
- It contains 9 projects, around 50 issues, and has a few P4 and git respositories with low traffic.
- There are only 3 users (concurrency is naturally not a problem).
- The server is running Windows Server 64-bit with 8 3Ghz cores, 32G of RAM, and 6TB of storage (so it's not a hardware class problem I would hope). The server has low load (usually less than 10%).
- It is running through Mongrel with an Apache 2.2 proxy/forward on top of that.
- It is using MySQL 5.1 for the backend.
- It is running in production environment.

I'd give you the output of script/about but it dies with:

D:\Redmine\trunk-acme>ruby script\about
c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- rails/info (MissingSourceFile)
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
from c:/ruby/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/dependencies.rb:155:in `require'
from c:/ruby/lib/ruby/gems/1.8/gems/rails-2.2.2/lib/commands/about.rb:2
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
from script/about:3

The meat of the problem is that pages take forever to load, and it looks like it's spending most of its time in view calculation (ie, it's not the db), although there's a big chunk leftover:

Processing IssuesController#show (for 10.90.240.43 at 2009-05-13 13:22:55) [GET]
Parameters: {"action"=>"show", "id"=>"10", "controller"=>"issues"}
Rendering template within layouts/base
Rendering issues/show.rhtml
Completed in 3426ms (View: 2064, DB: 94) | 200 OK [http://acme.<foo>.com/issues/10]

Processing MyController#page (for 10.90.240.43 at 2009-05-13 13:06:20) [GET]
Parameters: {"action"=>"page", "controller"=>"my"}
Rendering template within layouts/base
Rendering my/page
Completed in 747ms (View: 511, DB: 7) | 200 OK [http://acme.<foo>.com/my/page]

Processing ProjectsController#index (for 10.90.240.43 at 2009-05-13 13:06:28) [GET]
Parameters: {"action"=>"index", "controller"=>"projects"}
Rendering template within layouts/base
Rendering projects/index
Completed in 519ms (View: 287, DB: 4) | 200 OK [http://acme.<foo>.com/projects/]

I've tried bypassing Apache but it doesn't seem to affect much:

Processing IssuesController#show (for 127.0.0.1 at 2009-05-13 13:00:59) [GET]
Parameters: {"action"=>"show", "id"=>"10", "controller"=>"issues"}
Rendering template within layouts/base
Rendering issues/show.rhtml
Completed in 2624ms (View: 1503, DB: 85) | 200 OK [http://localhost/issues/10]

What are some recommendations for how I can diagnose the underlying problem?

Thanks,

m.

Replies (12)

RE: Where to start with performance analysis? - Added by Eric Davis over 10 years ago

You should sign up for something like New Relic. Even their free plan will let to track down performance issues.

Eric

RE: Where to start with performance analysis? - Added by Michael Vance over 10 years ago

Eric Davis wrote:

You should sign up for something like New Relic. Even their free plan will let to track down performance issues.

Thanks, I'll take a look, but I'm going to assume we have some fundamental configuration problem somewhere given the minimal data set we have; not sure how good the tool will be at diagnosing that.

m.

RE: Where to start with performance analysis? - Added by Wade Brainerd over 10 years ago

Hi Eric, I work with Michael and have some information to add to this topic.

I used Chrome's page load profiler to watch the page load. Interestingly: the document loads in about 2 seconds, but CSS and JavaScript resources push the time out to nearly 7 seconds.

  1. Why aren't these being cached?
  2. Why does it take so long to serve static files?

Screenshot showing Chrome's page load profiler.

We're using Mongrel running through an Apache proxy. I'm now not sure if this is now a Redmine issue or simply a Rails issue... Any thoughts would be greatly appreciated.

Thanks,
Wade

redmine-timing.jpg - Screenshot showing Chrome's page load profiler. (121 KB)

RE: Where to start with performance analysis? - Added by Michael Vance over 10 years ago

I just redid the test of bypassing the proxy and the weird 3-6s stalls serving pngs/css/jscript seem to have disappeared. I also noticed a weird warning in Chrome that the css and js were being sent as MIME type 'text/plain' instead of what they should be; not sure if that's related. We're going to look at redoing our Apache proxy setup to see if we can fix things.

RE: Where to start with performance analysis? - Added by Eric Davis over 10 years ago

It does sound like a proxy issue. The static assets (CSS, JS, and images) are in the public/ directory where the web server should access directly. So Mongel shouldn't ever send those files out, Apache should read them and send them right out.

Eric

RE: Where to start with performance analysis? - Added by Wade Brainerd over 10 years ago

Hi Eric,

How exactly is this static file serving supposed to be set up? I tried to do it with mod_rewrite, and it appears to be working, but I'm still getting the same stalls.

Here's the relevant section of our httpd.conf.

RewriteEngine on

# Redmine is the default site if you just go to http://acme/.
RedirectMatch ^/$ /redmine/

#RewriteLog logs/rewrite.log
#RewriteLogLevel 10

# Handle Redmine static files which exist on the filesystem directly.
# Although this is working, it sometimes delays responses up to 5s.
<Directory "D:/Redmine/trunk-acme/public/">
    AllowOverride None

    Order allow,deny
    Allow from all
</Directory>

RewriteCond D:/Redmine/trunk-acme/public%{REQUEST_URI} -f
RewriteRule (.*) D:/Redmine/trunk-acme/public$1

# Proxy all other page requests to the Redmine server.
<Location "/redmine/">
    ProxyPass http://localhost:3000/redmine/
    ProxyPassReverse http://localhost:3000/redmine/
</Location>

If you read it carefully, the RewriteCond looks for files in D:/Redmine/trunk-acme/public</redmine/staticfile.png>. I set up a NTFS junction locally so that these paths work and checked the results via rewrite.log.

Anyway, I noticed that redmine.org uses Apache. It usually loads faster than our intranet app too. Can you tell me how you guys have it configured?

Thanks!

Wade

RE: Where to start with performance analysis? - Added by Eric Davis over 10 years ago

Wade Brainerd wrote:

Hi Eric,

How exactly is this static file serving supposed to be set up? I tried to do it with mod_rewrite, and it appears to be working, but I'm still getting the same stalls.

If you set Apache's DocumentRoot to the public/ directory, Apache should check there for static assets and serve them up directly.

Anyway, I noticed that redmine.org uses Apache. It usually loads faster than our intranet app too. Can you tell me how you guys have it configured?

I don't know the specifics of Redmine.org (I'm not running it) but I think it's using Apache and FCGI.

I'm including my old mongrel cluster config. It's old so it might not work right away but I think the ProxyPass options are what you might be missing:

NameVirtualHost *:443
<VirtualHost *:443>
  SSLEngine on
#  Self-signed
#  SSLCertificateFile /etc/apache2/apache.pem

  #SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
  SSLCertificateFile /etc/apache2/ssl/projects.littlestreamsoftware.com.crt
  SSLCertificateKeyFile /etc/apache2/ssl/projects.littlestreamsoftware.com.key
  # this is only needed for GoDaddy certificates
  SSLCertificateChainFile /etc/apache2/ssl/gd_bundle.crt

  RequestHeader set X_FORWARDED_PROTO 'https'
  ServerAdmin webmaster@localhost
  RewriteEngine On

  PassengerEnabled off

  DocumentRoot /home/websites/projects.littlestreamsoftware.com/current/public
  <Directory "/home/websites/projects.littlestreamsoftware.com/current/public">
    Options FollowSymLinks
    AllowOverride All
    Order allow,deny
    Allow from all
  </Directory>

 <Proxy balancer://redmine_cluster>
   BalancerMember http://127.0.0.1:12000
   BalancerMember http://127.0.0.1:12001
 </Proxy>

 # These directories should always be served up by Apache, since they contain static content. Or just let rails do it. 
 ProxyPass /images !
 ProxyPass /stylesheets !
 ProxyPass /javascripts !
 ProxyPass /favicon.ico !

 ProxyPass / balancer://redmine_cluster/ 
 ProxyPassReverse / balancer://redmine_cluster/

  # Uncomment for rewrite debugging
#  RewriteLog /tmp/myapp_rewrite_log
#  RewriteLogLevel 9 

  # Check for maintenance file and redirect all requests
  RewriteCond %{DOCUMENT_ROOT}/system/maintenance.html -f
  RewriteCond %{SCRIPT_FILENAME} !maintenance.html
  RewriteRule ^.*$ /system/maintenance.html [L]

  # Rewrite index to check for static
  RewriteRule ^/$ /cache/index.html [QSA] 

  # Rewrite to check for Rails cached page
  RewriteRule ^([^.]+)$ /cache/$1.html [QSA]

  RewriteRule ^/(.*)$ balancer://redmine_cluster%{REQUEST_URI} [P,QSA,L]

  ErrorLog /var/log/apache2/error.log

  CustomLog /var/log/apache2/admin-access.log combined
  ServerSignature Off

  Include /etc/apache2/common/hide-svn
  Include /etc/apache2/common/deflate

</VirtualHost

RE: Where to start with performance analysis? - Added by Michael Vance about 10 years ago

Thread resurrection. We had started using the Wiki module more extensively and were embedding images and noticed this caused our page serve times to increase to the order of 10-20s, with each embedded image fetch taking approximately 3.7s (but pipelined). This was completely bizarre to us, so we decided to just punt entirely and clone our mysql over to a fresh Ubuntu install and mimic the Apache setup as closely as possible. This machine is several years old but we figured it would be an interesting comparison nonetheless.

The result? No page load that takes more then 700ms or so, most in the 550ms range. That's approximately 20-40x faster than on Windows, for a box that is probably 8-10x slower in overall hardware.

Now I'm prepared to accept that Ruby on Windows might be 50% slower, but this is several orders of magnitude. What could possibly be going on?

RE: Where to start with performance analysis? - Added by Michael Vance about 10 years ago

Here's the Firebug timing results for the same page on the Ubuntu box.

RE: Where to start with performance analysis? - Added by Eric Davis about 10 years ago

Michael Vance wrote:

The result? No page load that takes more then 700ms or so, most in the 550ms range. That's approximately 20-40x faster than on Windows, for a box that is probably 8-10x slower in overall hardware.

Now I'm prepared to accept that Ruby on Windows might be 50% slower, but this is several orders of magnitude. What could possibly be going on?

I have no idea why each request takes a few seconds on Windows. I'm seeing an average of 700ms time on demo.redmine.org with over 733,000 requests over the past week. My own Redmine is around 1000ms per request but I'm running a lot more code (e.g. plugins).

Eric Davis

RE: Where to start with performance analysis? - Added by Michael Vance about 10 years ago

It has to be something in our local configuration, but I'm really at pains to know what. This is what the production.log shows for a refresh of the wiki page in question:

Processing AttachmentsController#download (for 127.0.0.1 at 2009-12-10 11:21:22) [GET]
Parameters: {"action"=>"download", "id"=>"46", "controller"=>"attachments"}
Completed in 742ms (View: 0, DB: 8) | 200 OK [http://localhost/redmine/attachments/download/46]

With 0ms view and 8ms db, I'm left wondering how can I figure out where the other 742ms is going? A similar page on my local development box:

Processing AttachmentsController#download (for 127.0.0.1 at 2009-12-10 11:26:57) [GET]
Parameters: {"action"=>"download", "id"=>"17", "controller"=>"attachments"}
Completed in 13ms (View: 0, DB: 4) | 200 OK [http://localhost/attachments/download/17]

RE: Where to start with performance analysis? - Added by Michael Vance about 10 years ago

I just resolved this so I'm posting here for posterity. Profiling ruby.exe showed it was spending 95% of its time in garbage collection, which made no sense that there would be a difference in GC behavior between our server and my workstation. Eventually I ended up looking at a disparity in the ruby.exe version running on each machine and noted that the slower server was running 1.8.6 patchlevel 111 while my local machine was running 1.8.6 patchlevel 287. I downloaded and installed 1.8.6 patchlevel 383, nuked my ruby folder, and reinstalled, and now it runs as expected.

Whew!

(1-12/12)