Improving WooCommerce Order Search with Elasticsearch

WooCommerce is often a serious concern when it comes to speed, with many products and (hopefully) thousands of orders, the database becomes quite bulky. Yet, worse than a bulky database are underoptimized SQL queries that slow down the WordPress site to a standstill.

A WooCommerce feature in particular that is always slow is the Orders screen (when trying to search for an order) as viewed by an administrator in the WordPress Dashboard. Why is it so slow and how did we manage to optimize it?

The Problem

Our journey begins with two queries which are devastating in terms of query performance, found in class-wc-admin-post-types.php. There are several issues with the query:

  1. LIKE clauses with a leading wildcard are always slow
  2. meta_value columns are not indexed in WordPress, but even if they were, a leading wildcard query (starting with %) can’t use it
  3. CONCAT() in the WHERE clause is almost always a full table scan

A WooCommerce store with a mere 50,000 orders has been profiled and found to munch up to 80 seconds in the query on appropriate hardware and optimized backend configurations. This is devastating, tying up a PHP worker for such a long time and completely rendering the database unresponsive is obviously insane, not to mention having to wait so long on customers when they merely ask for their order number.

The Solution

Since adding indexing or FULLTEXT capabilities does not really help the cause we decided to look into several different ways we can improve performance, from “helper” tables with MySQL triggers, to a WordPress plugin that would maintain a separate index, but ultimately we resorted to an Elasticsearch cluster.

The ElasticPress Plugin for WordPress
The ElasticPress Plugin for WordPress

The ElasticPress plugin by 10up provides a fairly solid foundation, it helps greatly with the indexing part of WordPress objects in the database, keeps them in sync, and provides a good API for querying Elasticsearch.

Ready for the nitty-gritty of it all? Let’s get to the code bits.

Indexing the Data

First of all we setup filtering for indexed post statuses and types like so:

add_filter( 'ep_indexable_post_types', function( $post_types ) {
    $post_types[] = 'shop_order';
    return $post_types;
});

add_filter( 'ep_indexable_post_status', function( $post_status ) {
    return array(
        'publish',
        'wc-cancelled',
        'wc-completed',
        'wp-failed',
        'wc-on-hold',
        'wc-pending',
        'wc-processing',
        'wc-refunded',
    );
});

This tells ElasticPress that indexing shop_orders post types (and their custom statuses), among anything else it indexes by default, should be performed.

We also create additional keys for the concatenation of billing and shipping first and last names, so that searches like “John Doe” would be possible across the source two fields (see the CONCAT parts of the original query in the WooCommerce source code):

add_filter( 'ep_post_sync_args', function( $args, $post_id ) {
    $post = get_post( $post_id );
    if ( $post->post_type != 'shop_order' )
        return $args;

    $billing_name = array();
    $billing_name[] = (string) get_post_meta( $post_id, '_billing_first_name', true );
    $billing_name[] = (string) get_post_meta( $post_id, '_billing_last_name', true );
    $billing_name = implode( ' ', $billing_name );

    $shipping_name = array();
    $shipping_name[] = (string) get_post_meta( $post_id, '_shipping_first_name', true );
    $shipping_name[] = (string) get_post_meta( $post_id, '_shipping_last_name', true );
    $shipping_name = implode( ' ', $shipping_name );

    if ( ! empty( $billing_name ) )
        $args['post_meta']['__pj_billing_full_name'] = $billing_name;

    if ( ! empty( $shipping_name ) )
        $args['post_meta']['__pj_shipping_full_name'] = $shipping_name;

    return $args;
}, 10, 2 );

That’s pretty much the meat of the indexing code, and using WP-CLI we can create our initial index:

wp elasticpress index --setup --network-wide

Querying WooCommerce Orders

Next is the function that accepts a search string argument and the list of additional meta fields to search:

function pj_ep_search_orders( $query, $extra_fields = array() ) {
    $fields = array(
        'post_meta.__pj_billing_full_name',
        'post_meta.__pj_shipping_full_name',
    );

    foreach ( $extra_fields as $field ) {
        $fields[] = 'post_meta.' . $field;
    }

    $query = sprintf( '"*%s*"', pj_ep_escape_query( $query ) );

    $posts = ep_search( array(
        'size' => 500,
        'fields' => array(),
        'query' => array( 'bool' => array( 'must' => array(
            array( 'query_string' => array(
                'fields' => $fields,
                'query' => $query,
            ) ),
        ) ) ),
        'filter' => array(
            'and' => array( array( 'terms' => array( 'post_type.raw' => array( 'shop_order' ) ) ) ),
        ),
    ) );

    $post_ids = wp_list_pluck( $posts['posts'], '_id' );
    return $post_ids;
}

// Make sure the _id is set from the returned data.
add_filter( 'ep_retrieve_the_post', function( $post, $hit ) {
    if ( ! empty( $hit['_id'] ) )
        $post['_id'] = $hit['_id'];

    return $post;
}, 10, 2 );

The ep_search() function queries our configured Elasticsearch cluster and the argument gets converted into a JSON object which gets POST-ed to Elasticsearch.

We use an empty fields parameter because we only care about retrieving the post IDs, and not the full post data, which we’ll later get from MySQL — hence the second filter makes sure the _id field is set, because by default ElasticPress will expect a full post array.

We also escape the search query so no other Elasticsearch operators could be used while conducting this search. Remember — our goal was to get the exact same resultset WooCommerce gets originally, but much, much faster.

So using the function above retrieves the list of all post/order IDs that were found in the search. For more information on Elasticsearch’s query language, please refer to the documentation.

Hijacking WooCommerce’s Order Search

Now, how do we hijack the original WooCommerce shop_order_search_custom_fields function to use our search instead? Unfortunately the developers seem to make it pretty difficult for us, forcing us to resort to a couple of hacks.

First of all the original code use $wpdb directly, and not WP_Query-based calls, so ElasticPress can’t even begin to help. Second, there’s no straightforward way to remove the parse_query filter that is put on, because we have no access to the WC_Admin_Post_Types instance as its gobbled up by a black hole (i.e. not bound to any variable at all, let alone one we could access).

So here’s what we did:

add_filter( 'parse_query', function( $wp ) {
    global $pagenow, $wpdb;

    if ( 'edit.php' != $pagenow || empty( $wp->query_vars['s'] ) || $wp->query_vars['post_type'] != 'shop_order' ) {
        return;
    }

    $search_fields = array_map( 'wc_clean', apply_filters( 'woocommerce_shop_order_search_fields', array(
        '_order_key',
        '_billing_company',
        '_billing_address_1',
        // ...
    ) ) );

    $search_order_id = str_replace( 'Order #', '', $_GET['s'] );
    if ( ! is_numeric( $search_order_id ) ) {
        $search_order_id = 0;
    }

    // Search orders
    $post_ids = array_unique( array_merge(
        pj_ep_search_orders( $_GET['s'], $search_fields ),
        array( $search_order_id )
    ) );

    // Remove s - we don't want to search order name
    unset( $wp->query_vars['s'] );

    // so we know we're doing this
    $wp->query_vars['shop_order_search'] = true;

    // Search by found posts
    $wp->query_vars['post__in'] = $post_ids;

    // Remove the original filter
    $break = false;
    foreach ( $GLOBALS['wp_filter']['parse_query'] as $priority => $filters ) {
        foreach ( $filters as $key => $filter ) {
            $break = strpos( $key, 'shop_order_search_custom_fields' ) !== false;
            if ( $break ) break;
        }
        if ( $break ) break;
    }

    if ( $break ) unset( $GLOBALS['wp_filter']['parse_query'][ $priority ][ $key ] );
}, 0, 1 );

Having pretty much copied the WooCommerce function as is, but removing the two queries for our Elasticsearch one. Note that we didn’t even leave the item name search query in there either, so searching by item names involves a bit of further improvement on the above.

Then we remove the original filter by looking through all the filter names and forcefully plucking it out from the list. Thus, effectively, as our filter runs before WooCommerce’s one, the latter simply never runs. Mission accomplished.

The Results

The results were, unsurprisingly, staggering. An orders list page that took as much as 80 seconds to load loaded in under 4 seconds. The actual Elasticsearch query now takes only 30 milliseconds including cross-datacenter network latency and is ready for searches across literally millions of orders.

Obviously there’s much more that can be done here, including sorting the results by relevance versus order date, decoupling the orders index from the posts index, etc.

So there you go folks, this is actually how far we go to optimize projects hosted on our platform. Stay tuned for more case studies, performance tips and tricks by liking us on Facebook or following us on Twitter.

Cheers.