Blog by nikic. Find me on GitHub, StackOverflow and Twitter. Learn more about me.
« Back to article overview.

Supercolliding a PHP array

Did you know that inserting 2^16 = 65536 specially crafted values into a normal PHP array can take 30 seconds? Normally this would take only 0.01 seconds.

This is the code to reproduce it:

$size = pow(2, 16); // 16 is just an example, could also be 15 or 17

$startTime = microtime(true);

$array = array();
for ($key = 0, $maxKey = ($size - 1) * $size; $key <= $maxKey; $key += $size) {
    $array[$key] = 0;
}

$endTime = microtime(true);

echo 'Inserting ', $size, ' evil elements took ', $endTime - $startTime, ' seconds', "\n";

$startTime = microtime(true);

$array = array();
for ($key = 0, $maxKey = $size - 1; $key <= $maxKey; ++$key) {
    $array[$key] = 0;
}

$endTime = microtime(true);

echo 'Inserting ', $size, ' good elements took ', $endTime - $startTime, ' seconds', "\n";

Try it yourself! You may need to adjust the number 16 in the $size = pow(2, 16); line based on what hardware you have. I would start with 14 and increase it one at a time. By the way, I intentionally don’t provide an online demo at the amazing Viper-7 codepad this time so that it doesn’t get overloaded.

Here is a sample output from my machine:

Inserting 65536 evil elements took 32.726480007172 seconds
Inserting 65536 good elements took 0.014460802078247 seconds

How does this work?

PHP internally uses hashtables to store arrays. The above creates a hashtable with 100% collisions (i.e. all keys will have the same hash).

For those who don’t know how hashtables work: When you write $array[$key] in PHP the $key is run through a fast hash function that yields an integer. This integer is then used as an offset into a “real” C array (here “array” means “chunk of memory”).

Because every hash function has collisions this C array doesn’t actually store the value we want, but a linked list of possible values. PHP then walks these values one by one and does a full key comparison until it finds the right element with our $key.

Normally there will be only a small number of collisions, so in most cases the linked list will only have one value.

But the above script creates a hash where all elements collide.

How can you make all of them collide?

In PHP it’s very simple. If the key is an integer the hash function is just a no-op: The hash is the integer itself. All PHP does is apply a table mask on top of it: hash & tableMask.

This table mask ensures that the resulting hash is within the bounds of the underlying real C array. This underlying C array has always a size which is a power of 2 (this way the hashtable is always reasonably efficient in both space and time). So if you store 10 elements the real size will be 16. If you store 33 it will be 64. If you store 63 it will also be 64. The table mask is the size minus one. So if the size is 64, i.e. 1000000 in binary the table mask will be 63, i.e. 0111111 in binary.

So basically the table mask removes all bits that are greater than the hashtable size. And this is what we are exploiting: 0 & 0111111 = 0, but 1000000 & 0111111 = 0 too, so is 10000000 & 0111111 = 0 and 11000000 & 0111111 = 0. As long as we keep those lower bits the same the result of the hash will also stay the same.

So if we insert a total of 64 elements, the first one 0, the second one 64, the third one 128, the fourth one 192, etc., all of those elements will have the same hash (namely 0) and all will be put into the same linked list. And that’s what the code does. Only not with 64 elements, but with a few more.

Why is that so abysmally slow?

Well, for every insertion PHP has to traverse the whole linked list, element for element. On the first insertion it needs to traverse 0 elements (there is nothing there yet). On the second one it traverses 1 element. On the third one 2, on the fourth 3 and on the 64th one 63. Those who know a little bit of math probably know that 0+1+2+3+...+(n-1) = (n-1)*(n-2)/2. So the number of elements to traverse is quadratic. For 64 elements it’s 62*63/2 = 1953 traversals. For 2^16 = 65536 it’s 65534*65535/2=2147385345. As you see, the numbers grow fast. And with the number of iteration grows the execution time.

Hashtable collisions as DOS attack

At this point you may wonder what the above is actually useful for. For the casual user: Not useful at all. But the “bad guys” can easily exploit behavior like the above to perform a DOS (Denial of Service) attack on a server. Remember that $_GET and $_POST and $_REQUEST are just normal arrays and suffer from the same problems. So by sending a specially crafted POST request you can easily take a server down.

PHP is not the only language vulnerable to this. Actually pretty much all other languages used for creating websites have similar problems, as was presented at the 28C3 conference.

But there is hope! PHP already landed a change (which will ship with PHP 5.3.9) which will add a max_input_vars ini setting which defaults to 1000. This setting determines the maximum number of POST/GET variables that are accepted, so now only a maximum of 1000 collisions can be created. If you run the above script with 2^10 = 1024 elements you will get runtimes in the order of 0.003 seconds, which obviously is far less critical than 30 seconds. (Note though that above I am demonstrating an integer key collision. You can also collide string keys, in which case the traversal will be a good bit slower.)