Wordpress 2.0.3 ‘Bug Fix & Security Release’

Matt announced a release for Wordpress today on the Wordpress Development Blog. This release addresses several bugs and a security issue raised on Bugtraq.

Files changed in this release:

wp-admin/admin-db.php
wp-admin/admin-functions.php
wp-admin/admin.php
wp-admin/categories.php
wp-admin/cat-js.php
wp-admin/edit-comments.php
wp-admin/edit-form-advanced.php
wp-admin/edit-form-ajax-cat.php
wp-admin/edit-form-comment.php
wp-admin/edit-link-form.php
wp-admin/edit-page-form.php
wp-admin/edit-pages.php
wp-admin/edit.php
wp-admin/import/mt.php
wp-admin/inline-uploading.php
wp-admin/link-categories.php
wp-admin/link-import.php
wp-admin/link-manager.php
wp-admin/list-manipulation.js
wp-admin/list-manipulation.php
wp-admin/moderation.php
wp-admin/options-discussion.php
wp-admin/options-general.php
wp-admin/options-misc.php
wp-admin/options-permalink.php
wp-admin/options.php
wp-admin/options-reading.php
wp-admin/options-writing.php
wp-admin/page-new.php
wp-admin/plugin-editor.php
wp-admin/plugins.php
wp-admin/post.php
wp-admin/profile.php
wp-admin/profile-update.php
wp-admin/templates.php
wp-admin/theme-editor.php
wp-admin/themes.php
wp-admin/upgrade.php
wp-admin/upgrade-schema.php
wp-admin/user-edit.php
wp-admin/users.php
wp-comments-post.php
wp-content/plugins/akismet/akismet.php
wp-content/plugins/wp-db-backup.php
wp-includes/cache.php
wp-includes/capabilities.php
wp-includes/classes.php
wp-includes/comment-functions.php
wp-includes/default-filters.php
wp-includes/functions-compat.php
wp-includes/functions-formatting.php
wp-includes/functions.php
wp-includes/functions-post.php
wp-includes/kses.php
wp-includes/links.php
wp-includes/pluggable-functions.php
wp-includes/registration-functions.php
wp-includes/template-functions-general.php
wp-includes/template-functions-links.php
wp-includes/vars.php
wp-includes/version.php
wp-login.php

Unfortunately I havent had time to look into the security issue itself and detail its effects / how it has been patched, the post by Matt details the changes pretty comprehensively. I have however created a diff/patch from 2.0.2 to 2.0.3 and checked it into my SVN repository:

http://svn.lobstertechnology.com/wordpress-patches/wordpress-2.0.2-2.0.3.patch

You can apply this patch from the top directory of your Wordpress installation using the ‘patch’ program from a UNIX shell.

patch -p1 < wordpress-2.0.2-2.0.3.patch

However I haven’t yet personally tested patching up to 2.0.3 yet, I would suggest taking a backup first.

Wordpress 2.0.2 ‘Security Release’

Matt announced a security release for Wordpress today on the Wordpress Development Blog. This release addresses unannounced XSS problems apparently with comment posting & registration. The files affected by this release are:

wp-admin/admin-functions.php
wp-admin/admin-header.php
wp-admin/admin.php
wp-admin/edit-pages.php
wp-admin/import/blogger.php
wp-admin/list-manipulation.php
wp-admin/menu-header.php
wp-admin/post.php
wp-admin/user-edit.php
wp-comments-post.php
wp-includes/classes.php
wp-includes/comment-functions.php
wp-includes/functions.php
wp-includes/js/tinymce/langs/en.js
wp-includes/js/tinymce/plugins/wordpress/langs/en.js
wp-includes/js/tinymce/tiny_mce_gzip.php
wp-includes/template-functions-general.php
wp-includes/template-functions-links.php
wp-includes/version.php
wp-register.php
wp-settings.php

Here is a short summary of some of the notable changes:

wp-admin/admin-functions.php

- Forced default values of $_POST['comment_status'] = 'closed' & $_POST['ping_status'] = 'closed' when they are not set.
- Added escaping of attachment data-objects.
- Added escaping of posts data-objects.

wp-admin/admin-header.php

- Added check for ‘manage_categories’ privileges before showing the “Add” option to the category list while writing a post.

wp-admin/list-manipulation.php

- Abstracted deletion of links from direct SQL to a wp_delete_link method.

wp-admin/menu-header.php

- New ‘admin_notices’ Action allowing plugins to insert HTML immediately after the ‘adminmenu’ and ’submenu’ <ul>’s. I think I’ll be using that for my “New version of SpamKit available” messages.

wp-admin/post.php

- Additional HTTP Referrer checks using the ‘check_admin_referer’ method when submitting a new post, editing an attachment and editing a post.

wp-admin/user-edit.php

- Additional HTTP Referrer checks using the ‘check_admin_referer’ method when updating a User.

wp-includes/comment-functions.php

- Sanitising of user-submitted Name, Email & URL from cookies.

wp-register.php

- Forced blank default value of user-submitted email address & login name.
- Sanitising of the display of user-submitted email address & login.

I have created a patch to take 2.0.1 installations of Wordpress up to version 2.0.2 without having to reinstall and possibly loose customisations.

http://svn.lobstertechnology.com/wordpress-patches/wordpress-2.0.1-2.0.2.patch

You can apply this patch from the top directory of your Wordpress installation using the ‘patch’ program from a UNIX shell.

patch -p1 < wordpress-2.0.1-2.0.2.patch

Full Example Usage:

[michael@lobstertechnology ~] $ cd blog.lobstertechnology.com
[michael@lobstertechnology blog.lob...] $ patch -p1 < wordpress-2.0.1-2.0.2.patch
patching file wp-admin/admin-functions.php
patching file wp-admin/admin-header.php
patching file wp-admin/admin.php
patching file wp-admin/edit-pages.php
patching file wp-admin/import/blogger.php
patching file wp-admin/list-manipulation.php
patching file wp-admin/menu-header.php
patching file wp-admin/post.php
patching file wp-admin/user-edit.php
patching file wp-comments-post.php
patching file wp-includes/classes.php
...
[michael@lobstertechnology blog.lob...] $

Alternatively, you can simply replace only the files which have changed – listed above.

;)

WP Plugin » SpamKit Plugin 0.4 – Time-Based-Tokens to Fight Spam

This is a pretty significant release of SpamKit Plugin which provides some cool new features. This is checked into Subversion over at WP-Plugins.org and you can download the new version here spamkit-plugin.php.

Released as version 0.4:
* Added options page, this required sanity checks to prevent double definition of functions, implemented in a C-style #ifdef / #define pattern.
* Added full configuration functionality, this is done using built-in defaults, overridden by saved options making it upgrade proof.
* Added new EXPERIMENTAL check, comments posted by clients with no User-Agent string are auto-spammed and dont make it to the moderation page.
* Added new EXPERIMENTAL check, submitted email address is subject to format validation & DNS check for a mail exchanger.
* Updated to use Gerry’s new OO-based TBT code removing the dependancy on MCRYPT.
* Removed any path-dependant problems, making it compatible with all WP installs *i hope*.
* Added option to place trackback & pingbacks in the moderation queue, disabling this option causes them to be auto approved.
* Added option to moderate comments which fail TBT checks, disabling this option will mean the comments are automatically marked as spam and will never be seen.

Known Issues:
* Because direct calls to this script (for the badge) cannot access WP or any options, there is no easy way to provide a configurable /tmp directory. There is however a configuration option to disable this functionality if it causes problems.

Analysis of Spamming Zombie Botnets

Since writing my SpamKit Plugin I have been keeping a keen eye on the comment/trackback spam subject and have guinea pig’d my ideas on my own blog. Recently I noticed a distinct change in the sophistication of comment-spammers.

The early comment-spammers were using very basic HTTP clients, mostly without thinking about what’s going on ‘under the hood’. As such their spam-messages would come through with easily filtered HTTP “User-Agent” headers like “PEAR HTTP_Request class ( http://pear.php.net/ )” and “libwww-perl/5.803“. Over a period of a few months these – what I call 1st generation – bots began to dwindle in numbers, replaced by slightly more sophisticated clients which loosely emulated real browsers.

These 2nd generation bots were still very primitive, apart from changing the “User-Agent” and adding a few other headers they were still pretty basic and would repeatedly attempt to post comments over the period of a few seconds on a number of posts. This activity is also easily filtered since not even a superhuman Blog-fiend could comment on your top ten posts in less than 10 seconds.

All the attempts so far have been very basic, beginners in Perl / PHP could probably pull it off easily, and they are just as easily filtered out.

Over the Christmas period I observed some very unusual activity, a ’spam attack’ coming from dozens of source IP addresses, coordinated within a few minutes. I initially spotted it because the “User-Agent” header was completely empty – stands out a bit. After some investigation and further attacks I became pretty confident this wasn’t a fluke or coincidence of independent spammers.

I knocked up a quick Wordpress plug-in to capture as much info about these suspicious requests as possible. Here is one of the first attacks.

03/02/2006 20:37:44 212.0.XXX.XXX GET /
03/02/2006 20:38:14 201.242.XXX.XXX GET /category/wordpress/plugins/
03/02/2006 20:39:54 210.183.XXX.XXX GET /2006/02/02/search-term-highlighter-plugin-0-0/
03/02/2006 20:40:25 200.122.XXX.XXX GET /category/java/jakarta-velocity/
03/02/2006 20:40:37 62.23.XXX.XXX GET /2006/02/02/sitecom-cn-502-usb-bluetooth-dongle-works-on-linux/
03/02/2006 20:40:55 68.96.XXX.XXX GET /2006/02/02/search-term-highlighter-plugin-0-0/
03/02/2006 20:41:18 70.88.XXX.XXX POST /wp-comments-post.php
03/02/2006 20:41:20 70.88.XXX.XXX GET /category/thoughts/
03/02/2006 20:41:44 200.21.XXX.XXX POST /wp-comments-post.php
03/02/2006 20:41:48 200.21.XXX.XXX GET /2006/01/25/ti-7x21-flashmedia-sd-host-controller-104c-8033/
03/02/2006 20:42:16 61.145.XXX.XXX GET /category/wordpress/plugins/search-term-highlighter/
03/02/2006 20:42:24 217.113.XXX.XXX GET /category/flash/
03/02/2006 20:42:48 212.251.XXX.XXX GET /category/internet/
03/02/2006 20:43:04 205.180.XXX.XXX POST /wp-comments-post.php
03/02/2006 20:43:22 82.76.XXX.XXX GET /keywords/
03/02/2006 20:43:56 218.248.XXX.XXX GET /2006/02/02/search-term-highlighter-plugin-0-0/#postcomment
03/02/2006 20:44:13 206.191.XXX.XXX GET /2006/02/02/search-term-highlighter-plugin-0-0/%23postcomment
03/02/2006 20:44:14 206.191.XXX.XXX GET /category/tools/
03/02/2006 20:44:15 206.191.XXX.XXX GET /category/wordpress/plugins/search-term-highlighter/
03/02/2006 20:44:38 62.23.XXX.XXX GET /category/wordpress/plugins/search-term-highlighter/
03/02/2006 20:45:33 82.76.XXX.XXX POST /wp-comments-post.php
03/02/2006 20:45:34 82.76.XXX.XXX GET /category/tools/
03/02/2006 20:45:35 82.76.XXX.XXX POST /wp-comments-post.php
03/02/2006 20:45:48 203.162.XXX.XXX POST /wp-comments-post.php

In this particular instance, the attack was over a ten minute period. The first request was a HTTP GET on the root of my Blog “/” almost definitely used to feed the other bots with URL’s. Next, other clients in the Botnet continue to spider my Blog in parallel, building a list of URL’s to try later and lastly the first of the attempts to post a comment.

If you examine the sequence of requests, the bots are posting a comment, then coming back to check if it was successful. Analysis of later attacks even found other bots in the group checking if the comment posted by a peer bot was successful. The participating hosts are located all over the world but the majority are in North America and Asia.

This obviously demonstrates a very high level of sophistication. Initially I presumed that there was a single client application running requests in parallel over a group of HTTP proxies. After tracing down the locations & owners of each of the participants in the attacks I concluded it was infeasible that they all happened to have open proxies being abused in this way. A large proportion of the machines being used are actually web servers which have probably been exploited and are running IRC-controlled Trojan software.

Backing this up is the pace these attacks are evolving, the first few were very primitive without even a HTTP “User-Agent” header; however this was very quickly amended. The most recent attack I observed (1st March 2006) showed even more improvements, each client was almost indistinguishable from normal visitors. Providing full ‘Internet Explorer’ like headers of accepted mime types, charsets, languages and even including valid HTTP referrer headers and cookies.
Thankfully, all their time seems to be invested in improving the client software; the actual content of the comment was practically identical.

My SpamKit Plugin has so far easily handled each of these situations. It uses Gerry’s “Time Based Tokens” which were auto-generated and written into a hidden form field. Any incoming comments without a token or with an invalid token could be held for moderation while at the same time having zero impact on real visitors writing comments. Unlike techniques used by other solutions it does not require the user to type in a random key from an image like the ‘captcha’ technique, nor does it rely on JavaScript support in the browser. Until these spam bots reach a level of sophistication where they are parsing out HTML forms including hidden values and posting them, the current version of SpamKit will still be an effective solution.

However there is one major drawback with SpamKit; pingback/trackback’s are machine-generated, they will not have a “Time Based Token” and will be held for moderation as if they were spam. The problem with this is that spammers are also increasingly using the pingback/trackback mechanism to get their comments through the net. A lot of thought and discussion on this subject with Gerry lead to one potential solution; scoring & validation on the URL the pingback/trackback is supposedly from.

In early examples of trackback spam the URL given pointed straight to some advertising-based web page. Something like this lends itself to easy detection and filtering as the content when examined would score highly for spam key words like ‘Viagra’ etc. However these attacks have also evolved, the most recent of which point to real web pages or Blogs that contain obfuscated JavaScript redirection code – redirecting real visitor’s browsers but avoiding any page content detection techniques. In some cases the code has been inserted into Bulletin Boards or Guestbook’s which allow unfiltered HTML.

An example page with obfuscated JavaScript redirection (warning, this will redirect you to mp3search.ru)

http://zigfrid.blog.kataweb.it/il_mio_weblog/

So, what measures can be taken to stop spam?

Personally I don’t think you will ever get rid of spam, you have a pretty good chance of eradicating all but the most sophisticated of spammers, but you’ll never stop 100% of spam. The best methodology is to constantly evolve your defences at the same rate or faster than the opposition. For starters Gerry & I are constantly dreaming up new ways we can enhance SpamKit… Recent updates include encoding the original source IP address in the “Time Based Token” which would become invalid if submitted from a different address. Other works in progress include hardcore validation of the email address submitted; does the domain exist? does it have a mail exchanger MX record? etc. content validation, key word searching and probabilities of the content being spam – progress will be reported here and on Gerry’s site.

In the long term spammers are going to have clients that pretty much replicate real users down to the delays & randomness between requests. Countermeasures are going to have to be just as sophisticated, evaluating content and even executing JavaScript as if they were also real clients.

GoogleBot Experiment Success!

A month has past since I made a change to my Wordpress templates to experiment with Google bot (see previous post) and I can proudly report that it works like a charm.

My original problem was that Google was returning search results pointing to index-style pages on my Blog instead of the post’s themselves. These index pages like Categories & Archives would quickly update and the majority of visitors coming from Google search results were having a poor experience – the post that drew them in wasn’t obviously visible.

I knew I could use Robots.txt directives to control the INDEX-ing and FOLLOW-ing of my site, but I was hesitant about applying experimental rules to all Search Engine robots. Thankfully GoogleBot looks for a header specific to itself only, this let me apply custom rules to Google only very easily.

Using my Wordpress templates I added the following header on all index-style pages except the home page:

<meta name="GOOGLEBOT" content="NOINDEX,FOLLOW"/>

Basically this is instructing GoogleBot to follow links on this page, but not to index the page itself. The end result is that search results pointing to my blog are using the ‘permalink’ URL, not the index page it is listed on.

;)

WP Plugin » SpamKit Plugin 0.3 – Time-Based-Tokens to Fight Spam

This is a(nother) minor release of SpamKit Plugin which provides some cool new features. This is checked into Subversion over at WP-Plugins.org and you can download the new version here spamkit-plugin.php.

New Features:

* Minor improvements to the use of TBT’s, any token used within 5 seconds of being generated will be declared invalid. This is to stop the majority of automated clients parsing and sending the TBT token.

* Added a ‘web badge’ for display on your blog pages, it shows the number of spam comments caught with SpamKit. To use it simply add the following where you want the badge to appear:

PHP:
  1. <?php
  2.    if ( function_exists("spamkit_badge") ) {
  3.       spamkit_badge();
  4.    }
  5. ?>

Alternatively, you can have this spamkit_badge method return you the HTML markup by calling spamkit_badge( true ), for example:

PHP:
  1. <?php
  2.    if ( function_exists("spamkit_badge") ) {
  3.       $html = spamkit_badge(true);
  4.       echo $html;
  5.    }
  6. ?>

And it looks like this: SpamKit Plugin for Wordpress: Caught 25 Spam Comments!

* Added a custom pingback to my own blog triggered when the plugin is installed and activated on your own blog. This is used for installation counting and version tracking. Future versions will have this as configurable and optional.

Known Issues:

* The HTML generated by spamkit_badge link’s back to the plugin using an absolute URL (/wp-content/plugins/spamkit-plugin.php) which may not suit everyone’s Wordpress installation. This will be addressed in the next release.

* SpamKit uses temporary files to store the count, saving the image generating part of the script from having to make SQL calls. To do this it is assumed that all systems have a “/tmp” directory which is writable by the user the WWW server is running as. Temporary file names are fairly unique, they are generated by taking the crc32 value of $_SERVER['SERVER_NAME'].

WP Plugin » SpamKit Plugin 0.2 – Time-Based-Tokens to Fight Spam

This is a minor release of SpamKit Plugin to update Gerry's TBT code which now incorporates the IP address from $_SERVER['REMOTE_ADDR'] into the validation. This is checked into Subversion over at WP-Plugins.org and you can download the new version here spamkit-plugin.php.

Changelog:

version 0.2 - updated TBT code with improvements from Gerard Calderhead, TBT now includes the IP address from $_SERVER['REMOTE_ADDR'] into the check and fails if the ip is different during validation.

WP Plugin » Search Terms Highlighter Plugin 0.0

This is the first release of a prototype plugin I wrote to detect incoming requests from search engines and highlight the words in your posts that match the search terms. It also detects local searches with Wordpress and highlights those terms too.

Installation is simple, copy search-terms-highlighter-plugin.php into your Wordpress plugins directory and Activate the plugin from your Admin screen.

FEATURES:

- Automatically highlights search terms used from Google, MSN & Yahoo
- Automatically highlights search terms used in local Wordpress searches

LIMITATIONS:

- The Highlight colour is hard coded to Yellow
- Only 'detects' search query strings from known search engines

TODO:

- Add options page to allow configuration of highlight colours
- Support different highlight colours for each search term
- Support DHTML/JS 'switching off' of highlighting
- More generic query string parser to support unknown search engines
- The preg_replace in sth_plugin_the_content can be optimised to replace all keywords in one operation

Download Plugin

Wordpress Hack » Reading MySQL username & password from wp-config.php

A few months ago I was working on a prototype Wordpress plug-in that generated graphs using GNUPlot. To do this it had another script embedded at the end of the file which dumped data from the Wordpress database and produced graphs as images for the output.

The problem is that the GNUPlot section of the script is referred to directly my an "img" XHTML tag and is not loaded by Wordpress itself; it does not have access to the Wordpress database configuration.

To workaround this I wrote the following section of code, it opens the "wp-config.php" file and parses out anything matching the pattern "define( name, value )" into a $config variable. Immediately afterwards it attempts to connect to MySQL using these details.

Bit of a hack, but it was the only nice way I could see to do it and keep the plug-in simple & configuration free.

PHP:
  1. $config = array();
  2.    
  3.    // HACK Read wp-config.php for the Database username/password, including it wouldnt work
  4.    $handle = fopen( "../../wp-config.php", "r" );
  5.    if ( $handle ) {
  6.       $content = '';
  7.       while ( !feof($handle) ) {
  8.          $content .= fread( $handle, 1024 );
  9.       }
  10.       fclose( $handle );
  11.       if ( preg_match_all("/define\s*\(\s*'(.*?)'\s*,\s*'(.*?)'\s*\);/", $content, $matches, PREG_SET_ORDER) ) {
  12.          for ( $i = 0; $i <count($matches); $i++ ) {
  13.             $name = $matches[$i][1]; $value = $matches[$i][2];
  14.             $config[$name] = $value;
  15.          }
  16.       }
  17.    }
  18.  
  19.    
  20.    $link = mysql_connect( $config['DB_HOST'], $config['DB_USER'], $config['DB_PASSWORD']) or die ("Can't connect!");
  21.    mysql_select_db( $config['DB_NAME'] ) or die("Can't select database!");

Experimenting with Googlebot

In my previous post 'Blogs are fundamentally flawed…' I noted an observation that more often than not search results would direct a user to an index-style page containing the post instead of directly to the 'permalink' location of the post. This leads to a poor user-experience from the visitor’s point of view, on busy blogs the post has almost certainly moved since the page was spider'd. Google in particular appeared to be the worst for it.

Discussions on the subject with Gerry determined that this is most likely down to Google's PageRank technology; where index-style pages have a higher value than the post pages themselves. To get around this he suggested manipulating 'robots.txt' directives within the index-style pages.

On Google's "Information for Webmasters" help page I found they look for special 'robots.txt' directives and meta tags in documents when spidering specific to Googlebot only. This meant I could single out Googlebot for these directives and not affect other search engines (which don’t exhibit the problem so much).

I basically want Google to 'FOLLOW' links on all pages, but not to 'INDEX' the index-style pages like categories & archives by date. The desired effect being that Google can find all posts as before but simply ignore the index-style pages themselves. Implementing this is quite simple; I modified my theme's "header.php" file inserting the following code in the "head" section:

PHP:
  1. <?php
  2.     if ( !is_single() && !is_page() && !is_home() )
  3.         echo "  <meta name=\"GOOGLEBOT\" content=\"NOINDEX,FOLLOW\" />\n";
  4. ?>

This reads almost literally, if this is not a single post view, not a page view or the home page, add the following "meta..." tag. Although the home page is an index-style page I am reluctant to add 'NOINDEX' because I don't want it disappearing from search results. ;)

Now the long wait for the changes to reflect in Google's results.

Updated 24th January 2006 - Gerry pointed out this can be optimised using De Morgan's Law :P

PHP:
  1. <?php
  2.     if ( ! (is_single() || is_page() || is_home()) )
  3.         echo "  <meta name=\"GOOGLEBOT\" content=\"NOINDEX,FOLLOW\" />\n";
  4. ?>