Kaloyan K. Tsvetkov


“Ertuo”, allegedly the fastest PHP routing library

Let's use a different approach to routing then regular expressions and make it several times faster and more flexible

December 22, 2021

I know, the title seems like click-bait, but it is not, so please bear with me. And it is going to be a long ride, so please be patient.

This was part of a bigger article, but it was way too big to read and understand in one sitting. I got some excellent advice that it will be easier to read if it is split into two parts, one on the more theoretical part and another for the actual Ertuo implementation. You can read the theoretical part at "Limits of patterns based routing using regular expressions", where I go into a deep dive into routing as a concept, pattern based routing with regular expressions and the new "step by step" approach that should produce better results.

The article about the actual implementation of the "step by step" tree routing is this one.

Quick Recap

It is better if you read "Limits of patterns based routing using regular expressions", but if you have not, here are important parts:

Ertuo

Again, you should read "Limits of patterns based routing using regular expressions" to get yourself familiar with the theory behind step by step routing. It's a lot of talk, it sounds nice, but does it really work?

Here is my attempt at creating a step by step router. Everybody, say "Hello!" to Ertuo.

Ertuo (anagram of "Route"), is a small PHP library that does routing better and faster than conventional regular expression based routers.

https://github.com/ertuo-php/ertuo

Yesterday I've spent a huge chunk of the day writing the README for this project describing what it does and how to use it. Let's not repeat all of that here today, but instead try to get into more details.

The Basics

Few things are very different with Ertuo, so let's cover the basics first.

URI Format

Conventional routing works with the request URI as a string.

The step by step routing works with ... well, steps. The steps are not a string, but an array instead: an array with all of the parts of the request URI when split at the folder separators.

This array of exploded URL chunks is called source array. Here are few examples:

"/blog/post/hello-world" -> array("blog", "post", "hello-world")
"/en/about-us//" -> array("en", "about-us")
"/" -> array()

You can see how any issues with slashes become moot here as they are all stripped and used to split the request URI.

The term step will be used for each individual element of the source array.

Result

The product of the step by step routing are Result objects. These are very simple objects with three public properties, all of them arrays.

// here is what print_r($result) looks like after routing "/blog/page/4"
Ertuo\Result Object
(
    [path] => Array
        (
            [0] => blog
            [1] => page
            [2] => 4
        )

    [attributes] => Array
        (
            [_locale] => en
            [_controller] => blog
            [_route] => blog_page
            [_page] => 4
        )

    [junk] => Array
        (
        )
)

First property is $result->path, and this is where all of the recognized steps from the source array are collected. You can say that these are the "steps taken" when walking the source array.

"Every step you take, I'll be ..." putting it in the result path.

Comparing the result path and the source array you can see if the request URI was consumed entirely, or somewhere along the routing the journey stopped.

Second property is $result->attributes. This is where all of the variables extracted from the routing process are stored. This will include both URI parameters, as well as attributes found along the way.

Attributes in Ertuo is what is called "parameters" in other routing projects. This is everything extracted by the routing process. It will surely include some or all of the steps from the source array, but also it will include other attributes obtained either as default values, or as associated extra parameters.

Ertuo\Result Object
(
    ...
    [attributes] => Array
        (
            [_locale] => en
            [_controller] => blog
            [_route] => blog_page
            [_page] => 4
        )

In the example above you can see that the attributes property features some of the steps ("blog", "4"), but there are also other attributes obtained during the routing process, such as "en" (as the locale) and "blog_page" (as "_route").

Last property is $result->junk. This array will contain any steps left from the source array after the routing is done. This will occur when the routing stops before all of the steps are processed. You can use the result junk to identify bad URIs, and decide what to do with them: either 404 or other more graceful approach using the steps already extracted in the result path.

Remember the incomplete match case? The combination of result path + result junk will help you load that incomplete match.

Routes Declaration

The routes are created as a tree. Each node in that tree is a route, and each route declares the other routes nested inside it.

The nested routes are declared using a callback that returns something iterable. The keys of the iterable callback output are used to find the next route in the step by step process. The values of the iterable callback are the nested routes themselves.

yield 'blog' => Route::add()
->group(function()
{
    yield 'page' => Route::add('_page') ...

    yield 'post' => Route::add('_slug') ...
});

In this example, we have a route that is going to be addressed when the routing process reaches its parent and "blog" is the next step. That route has two nested routes on its own, identified by "page" and "post" keys.

I'll go into more depth in the "Routes Declaration" section below.

Route Rules

Each route has a "rule" associated with it that determines if the value of the current step can be accepted or not. They server the same role as the constraints and conditions in conventional routing.

// Using "enum" rule where the allowed steps are either "en" or "de"
yield 'locale' => Route::add('_locale')->rule('enum', ['en', 'de'])

There are few built-in rules, but you can extend that and creating your own custom rules. You can also alias rules if you want to re-use them, you can combine rules, etc.

The rules are a wider topic, and I'll highlight more of them in the "Route Rules" section below.

Attributes

I already addressed this topic when explaining what $result->attributes is. Attributes are collected as routing is performed. They are extracted at each step and collected in the result object.

There are several places where attributes are used:

As I said earlier, attributes in Ertuo are what "parameters" are in conventional routing. They are everything extracted by the routing process.

Dispatching

This is the actual routing. It's very simple: you have to split the request URI into a source array, and then feed it to the Dispatcher object; the dispatcher itself will return a result object.

// Turn the request URI into array
$src = Ertuo\Kit::quickExplode('/blog/page/4/');

// Parse the $src array into a routing result
$result = (new Ertuo\Dispatcher($routes))->dispatch($src);
print_r($result);

And that's it. You take what is inside the $result object and use it to call whatever controller or handler is there in it, and pass all of the collected attributes.

Routes Declaration

Now after the basics are covered, let's get a little bit deeper into some of the more advanced topics. First one is routes declaration.

Nested Routes

In the step by step routing the routes are created as a tree. Inside that tree each node s a route, and each route declares the other routes nested inside it.

The declaration of the nested routes is done using a callback. The callback must return something iterable, like an array, an iterator or a generator.

The keys of the iterable callback output are used to find the next route in the step by step process. The values of the iterable callback are the nested routes themselves:

yield 'blog' => Route::add()
->group(function()
{
    yield 'page' => Route::add('_page') ...

    yield 'post' => Route::add('_slug') ...
});

In this example, we have a route that is going to be addressed when the routing process reaches its parent route and "blog" is the value of the next step. That route has two nested routes on its own, identified by "page" and "post" keys.

This callback will be executed ONLY when you need to go deeper into that particular route. That saves A LOT of time declaring routes that are simply going to be ignored in any given routing process. Another oversimplification, but think of it as a chdir command that lands you inside one of the sub-directories.

Default Route

Remember when I said the keys are going to be used to find the next route in the process? There is not always going to be a match for the current step. This is when you can add so-called default route using an empty string as its key. This route, the one with the empty key, it will be used when there is no route found for the current step. It will also be used if the current step is empty.

yield 'blog' => Route::add()
->group(function()
{
    yield '' => Route::add() ... // this is the default route

    yield 'page' => Route::add('_page') ...

    yield 'post' => Route::add('_slug') ...
});

Route Rules

I hope I haven't lost you so far. Next advanced topic is about Ertuo route rules.

Conventional routing uses constraints and conditions on the parameters in its routes. Their role is to allow accepting only certain values and formats. Most of them are regular expressions, or some exotic stuff like expression language in Symfony.

In Ertuo, the constraints and conditions are just PHP code. Here's how it works.

Each route has a "rule" associated with it. The rules are simple checks against the value of the current step. They determine if the step is indeed from this route or not.

// Using "enum" rule where the allowed steps are either "en" or "de"
yield 'locale' => Route::add('_locale')->rule('enum', ['en', 'de'])

Another oversimplification: think of the rules as different types of file_exists() checks. This is what their role is - to determine if the value inside the current step belongs to this route or not.

There are few built-in rules, but you can extend that and creating your own custom rules. You can also alias rules if you want to re-use them, you can combine rules, etc.

Built-in Rules

The built-in rules at the moment are:

The "any" rule works just like the ordinary parameters in conventional routing. Its only requirement is that the step is not empty.

Direct Rule

There is also the "direct" rule. This happens when you leave the rule empty. In that case the routing process will look for a direct match in all of the nested routes. In other words, this will make it behave like a traditional folder.

Default Values

Optionally, routes have default values which are used as fallback if nothing is matched by the route rule.

// use "en" as the default locale
yield 'locale' => Route::add('_locale')->default('en')

This feature allows making optional steps in the routes. The above example illustrates this well: you can "force" a default locale value even if it is not present in the current source array.

Rule Flexibility

The route rules address the flexibility issue I wrote about earlier. You can really do anything with them as they are PHP code, and you are not restricted by the limits of regular expressions.

And there are things that are faster than regular expressions, for real. I have added an "id" rule in the project that validates positive integer values used traditionally as ids. This rule does it by casting and comparison, which is indeed quicker than executing a regular expression.

The rules are also able to look at the current progress of the routing. They get a copy of the result accumulated at the moment, so you can make rules with logic that depends on previously collected details in the result.

Benchmarks

Well, how does it perform this step by step approach and how fast Ertuo is?

The Bitbucket API Benchmark

Earlier in 2021 I created a benchmark project to measure performance of two of the popular routing libraries, Symfony Routing and FastRoute. It uses the Bitbucket API as the basis for a more real-world example.

Only the paths are used, and the HTTP verbs/methods are ignored. This is a sample of what the input for the benchmark cases looks like

/addon
/addon/linkers
/addon/linkers/{linker_key}
/addon/linkers/{linker_key}/values
/addon/linkers/{linker_key}/values/{value_id}
/hook_events
/hook_events/{subject_type}
/pullrequests/{selected_user}
/repositories
/repositories/{workspace}
/repositories/{workspace}/{repo_slug}
...

There are three different scenarios which I wanted to benchmark:

You can read more about the result from that benchmark here and here.

Benchmarking Ertuo

It was my goal to have a good benchmark case first before doing more development work on Ertuo and the step by step routing. I've already shared my frustration with lack of good benchmarks, and this was really important to me if I wanted to point out the benefits of the step by step routing.

I've used the same Bitbucket API setup to create benchmarks in the Ertuo project. I used the Symfony Routing component to compare the performance results of both routing techniques: conventional regular expression based routing vs. step by step routing.

Step by step is almost 8 to 10 times faster.

Edit January 3, 2022: Well, it is not. It is as fast as Symfony CompiledUrlMatcher and FastRoute cachedDispatcher. The benchmarks were running without opcache being enabled, which is a big gaffe as nobody will ever run a website in production without opcache. The screenshot below is from running phpbench without opcache, and that's why Ertuo has such a big advantage.

Here is a screenshot from running the benchmarks locally.

Results from Ertuo Benchmarks on PHP7.4

You can see results from the Github action that runs the benchmarks as well.

The Symfony Compiled benchmark is using CompiledUrlMatcher that provides one of the fastest regular expression routing for several years now.

The Ertuo Array benchmark is using arrays to declare its nested routes. That performs slightly better than generators as it does not have the overhead of creating the generators themselves.

Almost Over

Thank you for sticking up till the end of this.

I know the "fastest router" title is arrogant and loaded, but it is meant to serve a purpose.

I want to use it to draw some critical thinking into what I have described here. I also want to invite some peer review into what I've written to demonstrate the step by step routing. I consider the step by step routing a very natural and obvious way of doing routing. I am sure there are nuances and arguments for which I haven't thought about yet, and inviting a healthy discussion about this is the best way to learn more.


Edits