RC4/ARCFOUR Implementation in PHP

I wrote this RC4/ARCFOUR implementation in PHP - based on the original C source code posted on usenet in 1994. The rc4() call itself is completely self-contained, two other methods rc4_test() and rc4_benchmark() have been provided for testing and are optional.

My motivation for writing it was to replace the dependency on MCrypt in my SpamKit plugin for Wordpress - see Gerry's site for the updated TBT code I will wrap in the next SpamKit Plugin release.

This is software is completely public domain, all I ask for is a simple credit for my work if you find it useful.

View Source: rc4.php
View Source: rc4tests.php

Examples:

1. Simple encryption & decryption

PHP:
  1. <?php
  2. require_once( "rc4.php" );
  3.  
  4. $key = "0123456789abcdef";
  5. $plaintext = "Hello World!";
  6.  
  7. $ciphertext = rc4( $key, $plaintext );
  8.  
  9. $decrypted = rc4( $key, $ciphertext );
  10.  
  11. echo $decrypted . " - " . $plaintext . "\n";
  12.  
  13. ?>

2. Execute the tests and display the results

PHP:
  1. <?php
  2. require_once( "rc4tests.php" ); // Auto includes rc4.php
  3.  
  4. echo rc4_tests();
  5.  
  6. ?>

3. Execute the tests as benchmarks and display the results

PHP:
  1. <?php
  2. require_once( "rc4tests.php" ); // Auto includes rc4.php
  3.  
  4. echo rc4_benchmark();
  5.  
  6. ?>

Blogs are fundamentally flawed for the typical Grandma-User

It may seem a little sad but I can honestly say that reading my access_log is far more interesting than any soap opera on TV; they are filled with exotic foreigners, futuristic robots, drama, intrigue and personal tragedy. The best thing about it is that it’s all real; these are (mostly) real people who stumble across your humble Blog in the hope to find the solution to their problems.

Over the Christmas period I have observed more people visiting an ancient post of mine than in the past six months. The post is about my experiences with an external hard drive enclosure; more accurately, the chip / controller a great deal of hard drive enclosures use. Based on this I would guess that a considerable number of people got hard drive enclosures for their Christmas and ran into the same problems I had. Anyway, I am wandering a little.

It was reading my access_log's that made me realise that Blogs are actually a really bad format for the Grandma user...

Picture this, imagine your Grandma is Google'ing and happens to get a result that points to your Blog. She see's a teaser in the search results that shows you've written something about what she is looking for. Grandma clicks your link and is presented with your last 10 posts about God knows what and no sign of the post she saw the snippet of. Grandma goes back to Google thoroughly disappointed and never to return...

I encounter this phenomenon frequently, but because I am familiar with the Blog format I think nothing of drilling down to the relevant category to find the post I wanted; or if I am lazy click Google’s cached copy of the page. However for the average internet surfer it presents a fundamental flaw in the usability of the Blog format.

The problem is quite simple; search engines can never be up to date with your content all the time. The more frequent you post the more the problem will occur and the harder the post will be to find. The way I see it there are two possible solutions.

Smarter search engines

Enhancing search engines so they can distinguish between an index-page of posts and individual posts. This could be done by identifying sections of text within a page as an extract from another URL using something like RDF [http://www.w3.org/RDF/] which can already be embedded within XHTML [http://internetalchemy.org/2005/10/introducing-embedded-rdf]. Enclosing the section of text between the ‘<rdf:RDF>’ tags would do the trick.

In the Blog format the index pages and category pages would all contain embedded RDF indicating that the enclosed section of text is actually from another URL – its permalink. However this idea is not just limited to the Blog format, it has huge potential for most modern website formats.

This wouldn’t be a trivial change for search engines to make, it would be time-consuming and therefore costly but I believe it would be worthwhile for the future of internet content.

Smarter websites

A more short-term solution I am looking at is improving my website [i.e. Wordpress] to detect that the visitor has come from a search engine, try and determine the query they used from the ‘Referer’ HTTP header, then find and present the best matches to that query before any other posts are displayed.

Obviously this method has quite a few shortfalls:

* The ‘Referer’ header may not be there (some people disable it within the browser or through third-party software)
* Although handling the query formats of the main players is quite easy, not all search engines can be catered for
* It requires an intensive search of all the site’s posts, the standard Wordpress search won’t cut it

I contemplated getting the site to pull a copy of the URL given in the ‘Referer’ header, scan for the result that led the visitor to your site then locate the correct post given the snippet text… Then I decided that was a reeeeeeaaally bad idea.

In the long-run I believe the content and therefore the search engines that index it have to improve to cater for the format of internet content today and I think embedded RDF might be the key; unfortunately this cannot happen overnight.

In the meantime making smarter websites will help the situation until the content and the search engines catch up.

Upgrading to Wordpress 2.0

Keen to try out my own PlugIns in Wordpress 2.0 and swayed by this post I took the plunge and installed the nightly Wordpress 2.0 [20051219] build.

I manage the entire installation of my Wordpress blog in CVS; although I have several customisations to the core Wordpress code, I merged them seamlessly into the 2.0 code without problems. During the upgrade process I did get some warnings that the 'wp_usermeta' table didn't exist – because I hadn't run /wp-admin/upgrade.php yet which was no big deal. Apart from that the whole process was very straight-forward!

All the plugins I use appear to work perfectly: Weighted Words, iG:Syntax Hiliter, Spelling Checker & my own

My first opinions...

Although the new fancy rich-text editor looks great, I am still going to use the original code-version as I am so used to it and prefer writing my own xhtml. I am not so sure if I like the new admin colour scheme yet either, I have grown quite attached to the previous grey one.

I am extremely impressed that Wordpress has changed hugely 'under-the-hood' seemingly without breaking any of the existing interfaces and therefore plugins.

Version 2.0 boasts a lot of new features and the Trac Timeline is a hive of activity. I will probably keep running on the latest nightly build until the final release.

Horde 3.0.8 appears to be broken

Horde is an application framework used by a web-based email client IMP I use to read my email. From the Horde site [www.horde.org]:

The Horde Project is about creating high quality Open Source applications, based on PHP and the Horde Framework.

The guiding principles of the Horde Project are to create solid standards-based applications using intelligent object oriented design that, wherever possible, are designed to run on a wide range of platforms and backends.
There is great emphasis on making Horde as friendly to non-English speakers as possible. The Horde Framework currently supports many localization features such as Unicode and right-to-left text and generous users have contributed many translations for the framework and applications.

Today I downloaded and attempted to install Horde 3.0.8 - released on Sunday 11th December 2005 - something appears to be wrong as I didn't get very far. I followed all the given instructions, my server is configured correctly, all the dependencies are installed and working. I got so far as to use the web-based setup / configuration screen but it didn't allow me to save any settings or complete the setup process.

Following the instructions to the letter; I went to the 'Authentication' tab, selected 'IMAP Authentication', the page reloaded but didn't reflect my choice from the 'authentication backend' drop-down list. Instead it wouldn't display anything other than 'Let a Horde application handle authentication' but without the additional drop-down to select the application to use.

I initially suspected some Javascript incompatibility as I normally use Firefox and sooo many applications are written against Internet Explorer. But after several attempts from different browsers & platforms I gave up on the authentication tab, opting to try at least the 'Database' tab and configure MySQL. I could easily fill out all the details but when I tried to 'Generate Horde Configuration' it threw me back to the 'General' tab, highlighting that I had not completed required fields to do with error reporting & URL generation – both were set to valid values.

I re-read the documentation and re-did the whole installation... just in case I missed something or was too eager to lock down permissions. Again, exactly the same problem. Next, I relaxed ALL the permissions possible, I basically chmod'd the whole thing to 777 - in case the setup wasn't able to write to the config directory but this didn't help either.

The FAQ didn't provide much help so i went to the IRC channel #horde @ irc.freenode.net and found others with exactly the same problem *holds back the tears of frustration* ... But unfortunately no-one seemed to have any immediate answers.

On a hunch I grabbed Horde 3.0.7 from the FTP site and went through the whole setup process again. However this time it worked as expected and was running within ten minutes!!

Argh... Next step is to diff the code and see where it went wrong... (stay tuned)

Update - This issue was fixed in version 3.0.9 which is now available from www.horde.org

‘NASA Search 1.0′ ??? Something Google should worry about ???

Having written my own Wordpress logging / statistics plug-in over the weekend – which still in prototype, consider it a ‘coming soon’ - I have started to notice more and more peculiar User-Agents visiting my blog.

I quite like to keep an eye on what spiders / bots visit my sites, how often they return and try to infer something about how they were designed by watching them visit.

I was surprised recently to see that the big three ( Yahoo!, MSN & Google ) actually pull RSS feeds as well as HTML pages – of course this makes sense from a efficiency & bandwidth side of things, the RSS feed is the interesting stuff already stripped out.

Today’s one is a real winner though, coming from the following net block and advertising itself as “NASA Search 1.0”.

CODE:
  1. Comcast Cable Communications, Inc. NJ-SOUTH-4 (NET-68-46-128-0-1)
  2.                                   68.46.128.0 - 68.46.191.255

The bot / spider crawled my entire site within a few minutes, starting from my ‘changes-in-wordpress-152’ post and was completely oblivious to my robots.txt (it didn’t even request it).

Also, it appeared to be quite a primitive HTTP client, providing no referrer information or any of the usual headers “Connection: close”, “Accept: */*” even though it was sending a “HTTP/1.1” request. Surprisingly though it did persist a session cookie for the duration of its visit.

I Google’d for the phrase “NASA Search 1.0” and only seemed to find results where auto-generated-stats pages list visiting User-Agents.

It would be quite interesting (and maybe even fun – in a very geeky way) to write a Wordpress plug-in that watches for these peculiar bots and pings their details to a centralised stats database – forming a sort of spider-spider.

Anyway, I will be keeping a keen eye out for the return of “NASA Search 1.0” … Could it be the next greatest NASA funded project? Or is it just some smart a** that has figured out how to change the User-Agent string in his favourite spider/bot.

Stay tuned!