Having a dynamic sitemaps file for search robots scanning
Such a thing would help indexation of projects, wikis, forums, and any other relevant pages for the most important bots.
Main benefit would be the ability to say to these bots when a page has been updated for the last time (especially for wikis pages).
It would go similarly to what have been recently done for the robots.txt file: #2491 and r2319.
In a more developed version it could allow the administrators (and maybe managers too) to set the periodicity scanning value for specifics pages (like news pages for instance, or, again, wikis pages), or give some pages a bigger importance value for indexing.
It is also imaginable to think about a "robots scan configuration tools" that fusion tunings for robots.txt and sitemaps.xml. Feature of such tool would be the ability to configure which pages should be scanned by (which) robots.
In the end: fine controls of what search robots can see.
#1 Updated by Axel Voitier over 9 years ago
- File sitemaps.01.patch added
Here is a "kick off" patch for this feature.
It does not attend to be a useful one. It just start the work.
- Creation of a new controller and view named RobotsController and robots/sitemaps.rhtml
- List only wiki pages with their last update date in the sitemaps.xml file
- Add a route for that file
I guess the robots.txt generation could be added in this controlelr instead of Welcome.
This patch need modifications done in r2319! (mainly Project.public.active method).
- Check if a "http" or "https" have to be used following the Redmine configuration
- Add more pages (from projects, forums, etc.). Should be all pages publicly accessible in Redmine!
#2 Updated by Axel Voitier over 9 years ago
Does it needs some more pages? Or maybe less?
By the way, this lets me think that the robots.txt file disallowing pages for robots could be much more complete and fine detailed, in accordance with this list of relevant pages.