Search this blog

I think I heard somewhere that you should always lead with the headline. Well, you can now search this blog. Why? Well, I was interested to implementing a client-side search feature. Ah, you meant why would you search my blog? Can’t give you that reason, but now you can. This is the search page.

Search function

Search is not always an easy topic. Especially when you are running, as I have previously written, a static website. A static website, has by its very definition no server rendering parts. And, of course, I don’t really want to outsource this to a big company like Google, with their Programmable Search Engine product.

I did find a few JavaScript search libraries. Most of them employ some sort of fuzzy pattern matching and require an array of documents to search through. There is Lunr which I have seen before, but then there is also the new kid on the block: Fuse.js. Since this is just a fun little project for me, I decided to try out the new one. I built a JSON file with all my data and then searched through Fuse.js. And I’m delighted with how it turned out.

I did also run some tests, how well the search works. For example, this is a search for ESPHome, about which I have written quite a bit.

The results when searching for *ESPHome*

Overall, the results are wonderful. It shows the most relevant first (the tag that has all the ESP) and also all three articles that are about ESP32 devices or ESPHome. Somehow the article about RSS also ended up in the list. I’m not quite sure as to why this is, especially since the shown text snippet doesn’t contain anything about ESPHome, while the result below¹ contains the literal word I’m searching for.

I did also search for Home Assistant. It does give relevant articles, that are all about Smart home (and also includes some ESPHome articles). Overall, I’m thrilled with the results it provides, but I do also think that there is some improvement for the results².

Technical

The technical part is actually super easy. I use Hugo to generate a big json file with all the pages to search through.

I then have the following JavaScript that does all the loading and searching:

const searchCutoff = 0.9;

async function fetchSearchIndex() {
    try {
        const response = await fetch('/search.json');
        const data = await response.json();
        return data;
    } catch (error) {
        console.error('Error fetching JSON data:', error);
    }
}

function findLongestIndexPair(indices) {
    let longestPair = [];
    let maxLength = 0;

    indices.forEach(pair => {
        let length = pair[1] - pair[0] + 1;
        if (length > maxLength) {
            maxLength = length;
            longestPair = pair;
        }
    });
    return longestPair;
}

function findWordIndexAtCharIndex(text, charIndex) {
    const words = text.split(/\s+/);
    let index = 0;
    let currentCharCount = 0;

    for (let i = 0; i < words.length; i++) {
        let wordLength = words[i].length + 1;
        if (currentCharCount + wordLength > charIndex) {
            index = i;
            break;
        }
        currentCharCount += wordLength;
    }

    return index;
}

function extractSnippet(text, indices, wordLimit) {
    const words = text.split(/\s+/);
    const longestIndices = findLongestIndexPair(indices);

    let startCharIndex = longestIndices[0];
    let endCharIndex = longestIndices[1];
    
    let startWordIndex = findWordIndexAtCharIndex(text, startCharIndex);
    let endWordIndex = findWordIndexAtCharIndex(text, endCharIndex);

    startWordIndex = Math.max(startWordIndex - wordLimit, 0);
    endWordIndex = Math.min(endWordIndex + wordLimit, words.length - 1);

    let snippetWords = words.slice(startWordIndex, endWordIndex + 1).join(' ');

    return (startCharIndex > 0 ? '... ' : '') + snippetWords + (endCharIndex < text.length ? ' ...' : '');
}

function updateUrlQuery(query) {
    // Replace the current URL's hash without adding a new entry to the browser history
    window.history.replaceState(null, '', `#q=${encodeURIComponent(query)}`);
}

async function initializeFuse() {
    const data = await fetchSearchIndex();

    if (!data) {
        console.error('No data found.');
        return;
    }

    const options = {
        keys: ['title', 'content'],
        shouldSort: true,
        includeScore: true,
        includeMatches: true,
        ignoreLocation: true,
        minMatchCharLength: 4,
        distance: 10000,
        threshold: 0.4
    };

    const fuse = new Fuse(data, options);

    const searchReady = document.querySelector('.search-ready');
    const searchLoading = document.querySelector('.search-loading');
    searchReady.classList.remove('hidden');
    searchLoading.classList.add('hidden');

    const searchInput = document.getElementById('search-input');
    const searchResultsContainer = document.getElementById('search-results');

    function performSearch(query) {
        if (query.length === 0) {
            searchResultsContainer.innerHTML = '';
            return;
        }

        const result = fuse.search(query);

        searchResultsContainer.innerHTML = '';

        result.filter(result => result.score < searchCutoff).forEach(({ item, matches }) => {
            const resultElement = document.createElement('a');
            resultElement.classList.add('search-result');
            resultElement.href = item.path;

            const titleElement = document.createElement('h2');
            titleElement.textContent = item.title;

            const metadataElement = document.createElement('div');
            metadataElement.classList.add('metadata');

            const timeElement = document.createElement('time');
            timeElement.setAttribute('datetime', item.dateISO);
            timeElement.textContent = item.dateString;

            const urlElement = document.createElement('span');
            urlElement.classList.add('url');
            urlElement.textContent = item.path;

            const contentElement = document.createElement('p');
            contentElement.classList.add('content');

            const contentMatch = matches.find(match => match.key === 'content');
            if (contentMatch) {
                const indices = contentMatch.indices;
                contentElement.innerHTML = extractSnippet(item.content, indices, 35);
            } else {
                contentElement.innerHTML = item.summary;
            }

            metadataElement.appendChild(timeElement);
            metadataElement.appendChild(document.createTextNode(' • '));
            metadataElement.appendChild(urlElement);

            resultElement.appendChild(titleElement);
            resultElement.appendChild(metadataElement);
            resultElement.appendChild(contentElement);

            searchResultsContainer.appendChild(resultElement);
        });
    }

    searchInput.addEventListener('input', (event) => {
        const query = event.target.value;
        updateUrlQuery(query);  // Update the URL hash without adding to history
        performSearch(query);
    });

    // Initialize search from URL query if present
    const initialQuery = new URLSearchParams(window.location.hash.slice(1)).get('q');
    if (initialQuery) {
        searchInput.value = decodeURIComponent(initialQuery);
        performSearch(initialQuery);
    }
}

initializeFuse();

The working of this script can basically describe as the following:

Load the json file and build the search index from there.
If the user types search for the search term
Sort the results by relevancy
Extract a small snippet to display on the search page.

There is also a bit of code that handles reading or updating the search parameter in the URL. But that is basically all the magic.

Post Script: Quiet launch

The search has been around since sometime around July. I was trying it out a bit for myself first, and I was also waiting to see if it works. Currently, the file that is loaded as the search index, is about 300 KB heavy. This is still okay for loading when a user opens up the search page, but this is not gonna scale forever. So I will have to come up with a better solution at some point. I do have two ideas to address that, though.

I did also see in my analytics, that there have been other visitors that somehow ended up on the search page, even without having it linked from anywhere (except from the Sitemap). So, now people can search the blog, without interacting with anything on the server. This will all run locally. And I, personally, think this is quite spectacular.

The search results are supposed to be sorted by relevancy. So I’m not quite sure how the RSS feed article goes higher than a search snippet that literally contains the exact words, but I guess good to know that there is still some room for improvement. ↩︎
Also, this current search function is not gonna scale forever, but I’ll figure that out when I get to it. ↩︎

Tags: Blog, Hugo, Technical