Building a better autocomplete

I recently tackled ways to make a search experience more responsive. Autocomplete suggestions are a small but very impactful area to optimize. Upon digging into it, I found a rabbit hole of subtle optimizations to really make the search experience first-class.

The setup

Let's start with a dead-simple implementation.

<input type="search" placeholder="Type here" />
<pre><!-- we'll put autocomplete suggestions here --></pre>

I'm setting up a basic event listener on the input field which calls a function each time the input changes. This function is where we'll do the majority of our work.

const input = document.querySelector('input');
const results = document.querySelector('pre');

async function handleNewInput(query) {
  // TODO: add our logic here
  results.innerText = 'set suggestions here';
}

// listen for typing
input.addEventListener('input', e => handleNewInput(e.target.value));

Then we'll simulate an AJAX request which retrieves the suggestions for a given query.

// simulate an AJAX call to fetch results
function getSuggestions(query) {
  return new Promise(resolve => {
    // simulate a small network delay (in milliseconds)
    const delay = 200;
    window.setTimeout(() => {
      const suggestions = [
        `${query} one`,
        `${query} two`,
        `${query} three`,
      ].join("\n");
      resolve(suggestions);
    }, delay);
  });
}

In reality, this would probably return a decoded JSON object, have more details, etc., but it's sufficient for this demo.

A naive implementation

Let's try the most obvious approach:

async function handleNewInput(query) {
  results.innerText = await getSuggestions(query);
}

Seems to work pretty well!

example1.html

But wait--our getSuggestions() function doesn't simulate network conditions very well. It returns a response in exactly 200ms. In reality, requests will have varying response times, and may even arrive out of order. So let's update our getSuggestions() simulation with something more realistic:

function getSuggestions(query) {
  return new Promise(resolve => {
    const delay = 50 + (Math.random() * 350); // 50ms to 400ms
    window.setTimeout(() => {
      // ...
    }, delay);
  })
}

A response delay time between 50ms and 400ms would simulate some more turbulent network conditions. But our autocomplete needs to handle conditions like that, so let's use this for testing.

Here's how that works once we randomize the delay time. Try typing a few search terms in that field (type rapidly).

example2.html

This implementation doesn't handle out-of-order responses very well. If I quickly type a search term like "beach ball", I might end up with suggestions for "beach bal", or whichever HTTP request finished last. Not great.

Enforcing strict ordering

We can solve this problem giving each request sequential order numbers, then remembering the last request number we've received a response from. Any responses that we receive which came before our last received request number are silently dropped.

So say we issue requests #1, #2, and #3.

We start a lastReqSeen variable initialized to 0.
Then we see a response from #2 come back first. We see that 2 is greater than our lastReqSeen of 0, so we set lastReqSeen to 2 and display the results.
Then we get a response from #1. Since 1 is less than our lastReqSeen of 2, we silently drop it and do nothing with the results.
Then we get #3's response back. We see that it's greater than our lastReqSeen of 2, so we update lastReqSeen to 3 and display the results.
And so on and so forth.

Here's how we might implement that. We'll need to modify getSuggestions() so that it takes a request number argument along with the query, and then we'll need to have two persistent variables which track 1) the last request number that's completed, and 2) the order number that we want to assign to the next request we fire.

let lastReqSeen = -1; // remember the last request that we've seen
let nextReqNumber = 0; // the order number that we'll give the next request

async function handleNewInput(query) {
  // send off the request and wait for a response
  const {suggestions, reqNum} = await getSuggestions(query, nextReqNumber++);

  // if we saw another later request already, drop this result
  if (reqNum <= lastReqSeen) {
    return;
  }

  // otherwise, use it
  results.innerText = suggestions;
  lastReqSeen = reqNum; // and remember this request as the last one seen
}

function getSuggestions(query, reqNum) { // capture the request number
  return new Promise(resolve => {
    // ...
    window.setTimeout(() => {
      // ...
      resolve({ suggestions, reqNum }); // include the request number
    }, delay);
  })
}

That gives us this:

example3.html

That's much better! Results no longer appear out of order, and assuming that all of the autocomplete HTTP requests return successfully, I always see suggestions for my full search term eventually (no "beach bal" suggestions when I type "beach ball").

But there's still some problems here. The suggestions are changing too fast to read them because they're changing with every single character. And further, this generates an HTTP request for each and every character typed, which is unnecessary additional load on our search backend.

Slow things down

So let's limit the responses we get from the server. One common and easy way to do this is debouncing. We'll use Lodash's _.debounce() function for this. We'll pick a noticeable but short delay, like 300ms. Let's wrap the input event handler with _.debounce():

input.addEventListener(
  'input',
  _.debounce(e => handleNewInput(e.target.value), 300)
);

Try typing something quickly and then pausing for a moment. You'll see results show up shortly after you pause.

example4.html

Our HTTP requests are now issued much less frequently, which minimizes server load. But it's not great; users who type fast will end up typing their whole query before they realize that we have improvement suggestions for them.

Keep the flow going

A better solution to this would be throttling. Instead of waiting for a pause in the input, we can continually send autocomplete queries at a regular pace while the user is typing, while still limiting the overall rate of requests. This minimizes backend load and keeps the suggestions from changing too quickly.

Excellent illustration of throttling from RxJS

Notice that this is how Google's auto-suggest works (as of right now); if you type fast, you only get updated suggestions every so often.

Throttling is is similar to debouncing, except:

It sends the first request immediately when the first trigger is received
New triggers don't reset the timer; they just cancel the function call from right before themselves, and replace it with their handler.

Lodash has a convenient _.throttle() function that we can use. Let's replace _.debounce() with _.throttle() and see how that works:

input.addEventListener(
  'input',
  _.throttle(e => handleNewInput(e.target.value), 300)
);

Now requests are sent every 300ms to the server, even if the user is typing quickly.

example5.html

Beautiful! We regularly get updated results, those results stay around long enough to read at a glance, and we're minimizing our server load.

Is there a better way?

But wait. Our code here is messy and a little hard to follow. We've got global variables, we're relying on a specific function instance to stick around (the result of _.throttle()), and if we needed to add any other adjustments in the future, we'd have to increasingly junk things up.

There are definitely ways to manage this by encapsulating our vars, etc. But we can simplify a lot of this manual work using an event-based library like RxJS.

Enter Observables

An observable is effectively an object which emits values synchronously or asynchronously. It's a similar abstraction to Promises, which let us reason more simply about one-time asynchronous operations. But it generalizes that concept and extends it to cover multiple separate emissions.

Observables are nothing without operators. Operators allow us to manipulate, transform, combine, and do all manner of operations on the emissions of observables.

Let's look at how we would implement the same search behavior using RxJS principles. We can drop Lodash, and just use RxJS:

const { fromEvent, from, asyncScheduler } = rxjs;
const { map, throttleTime, switchAll } = rxjs.operators;

const input = document.querySelector('input');
const results = document.querySelector('pre');

fromEvent(input, 'input') // #1
  .pipe( // #2
    throttleTime(300, asyncScheduler, { leading: true, trailing: true }), // #3
    map(e => e.target.value), // #4
    map(query => from(getSuggestions(query))), // #5
    switchAll() // #6
  ).subscribe(suggestions => results.innerText = suggestions); // #7

Let's step through this:

Create an Observable instance which emits events for each input event.
Pass the emitted events through a chain of operations. This will return a new observable at the end.
As the first operation, throttle events to every 300ms, and make sure to include both leading and trailing events. Ignore asyncScheduler--it's out of scope for this article.
Pull out just the search query from the input event.
Fetch the suggestions from the server, and use from() to transform the request promise to an Observable. This is a little hard to follow; at this point, our pipeline is an observable, emitting observables, each of which emits 1 suggestion result. RxJS calls this a "higher-order observable".
As the final operator, coalesce the higher-order observable back into a simple observable comprised of suggestion results. The magic here is that switchAll() will automatically enforce that later emitted events have precedence over earlier ones. If the second request finishes before the first, the first request will be ignored when it completes.
Use the output of the final observable to set the contents of the suggestions element.

The downside here is that you need to understand the paradigms that RxJS uses. But once you've got that, the resulting code is far easier to reason about. We don't have any variables tracking intermediate state, and the code is simple enough that we don't even need to decompose it into smaller functions.

The result is functionally identical to our homegrown version:

example6.html

Taking it further

There are several things that can be done to further optimize the user experience and performance of suggestions in the search bar.

Specially handle empty string queries by immediately returning a "no suggestions" response. This prevents useless HTTP requests from being sent to the backend.
Once you fetch suggestions for a query, cache those results client-side. If the user backspaces their entire query, you can display cached suggestions instead of re-sending those HTTP requests.
Slap a cache layer or CDN in front of the autocomplete results endpoint if the results aren't personalized to the user.