Tuesday, June 10, 2014

Perl URL Router Benchmarks - The URL Router for C100K

To improve the overhead of URL routing, I developed a tiny URL routing library in C - R3, which is designed for high performance and low memory usage.

The implementation uses the trie data structure. Trie is an efficient information retrieval data structure, which uses a prefix tree to search inserted strings by their common prefix, thus it's pretty fast for matching routes.

The tricky part of R3 is that we need to support regular expression patterns in our routing definition to dispatch path dynamically, thus R3 implemented a variant of Trie by mixing some concepts of DFA, not just simply inserting plain string paths into the prefix tree.

There are several comparison types in R3: plain string, opcode or regular expression pattern. Each node have its own comparison type to dispatch the path to their own children.

For those simple regular expression patterns, R3 compiles the regular expressions into opcodes to enhance the comparison speed and it's twice faster than using PCRE library to match the path with the patterns.

And since there are a lot of things to do before the matching operation, the whole tree structure needs to be compiled before dispatching paths.

Here is the continuous benchmark result of R3 itself, The C version R3 can dispatch over 11 million plain string paths per second.

A talented Perl hacker - Cindy Wang developed a CPAN module based on R3 library - Router::R3, which has become the fastest routing module on CPAN. It's so fast that you just can't deny.

The Perl version R3 (Router::R3) can dispatch nearly 1 million static paths per second while Journey (The Rails router) dispatches 9.9 thousand static paths per second. and it's 466% faster than the second fastest module (Router::Room) in static path dispatching, 751% faster in first character matching.

We also used the routing path generator from the rails' pull request stevegraham/rails/pull/1 to benchmark the Perl URL routers on CPAN, including HTTP::Router, Router::Simple, Router::Boom and Router::R3

There are over 3 hundred generated routing paths in the benchmark script. we tested the "static path matching" with the route in the middle of the list, and also the first route matching, regular expression matching.

Here comes the dispatching speed result:

Benchmarking 'plain string matching' by path '/corge/quux/bar'
===============================================================
                Rate  HTTP::Router Router::Simple  Router::Boom    Router::R3
HTTP::Router      203/s            --           -89%         -100%         -100%
Router::Simple   1782/s          779%             --          -99%         -100%
Router::Boom   168658/s        83094%          9365%            --          -82%
Router::R3     954407/s       470684%         53461%          466%            --
Benchmarking 'regexp string matching' by path '/post/2012/03'
===============================================================
                Rate  HTTP::Router Router::Simple  Router::Boom    Router::R3
HTTP::Router     1076/s            --           -88%          -99%         -100%
Router::Simple   9309/s          765%             --          -91%          -97%
Router::Boom   104387/s         9602%          1021%            --          -66%
Router::R3     306925/s        28426%          3197%          194%            --
Benchmarking 'first character matching' by path '/'
===============================================================
                    Rate  HTTP::Router Router::Simple Router::Boom    Router::R3
HTTP::Router      3839/s            --           -87%         -98%         -100%
Router::Simple   30545/s          696%             --         -83%          -98%
Router::Boom    180555/s         4603%           491%           --          -88%
Router::R3     1535999/s        39910%          4929%         751%            --