How to Parse HTML Response Without Loading Any Images

Or: How I stopped worrying and learned to createHTMLDocument.

—–

TL;DR

If you want to parse HTML response without loading any unnecessary resources like images or scripts inside, use DOMImplementation’s createHTMLDocument()
to create new document which is not connected to the current one parsed by the browser and behaves as well as normal document.

—–

There are times when as a frontend developer you can’t always use RESTful APIs providing well formatted JSON server responses with which you can do whatever you like. Sometimes you just have to use HTML responses, no matter how badly it sounds.

While working on our latest project I came across an interesting case. One of the project’s requirements was to have some of the corresponding pages’ content (basically a hero section) preloaded. The solution was maybe not the best, but it was quite simple :

  1. Send an AJAX request for the desired page.
  2. Get its whole HTML in the response (partial was not an option here)
  3. Create new document element or jQuery object from it just to find the needed section
  4. Append it to your current document when needed.

Simple, works, we can go home now. Well, nope.

Later on, while testing network usage, I realised that actually there are some images loaded, which should not be there. What the-? They’re not even in the DOM. Actually, they were there. Not rendered and attached, but created in the context of current document.

Calling document.createElement(el).innerHTML(data) or with jQuery $(data) creates a node in current document which triggers the browser to treat it like the rest of the page, which means loading all of the resources like images, scripts, etc. inside.

So, what’s the solution to that?

I’ve read whole stories about replacing src attributes with some dummy data and restoring them later, removing <img> elements with RegEx (sic!), storing them somewhere and recreate when needed, even using web workers to provide some better performance. None of this crap.

Better and easier solution

Create an entirely new document, which is not connected with current one. Short research reveals that it’s possible and is super easy to use in our case.
Document object provides  document.implementation.createHTMLDocument() method, which is intended to do just that. What’s best — creating new elements inside our entirely new and detached document isn’t recognised by a browser to load anything extra and we can traverse it with our favourite methods and then just attach it to our current document.

Here’s a basic code snippet showing how easy it is to deal with new document, where data is HTML string response. It just works:

function insertPreview (data) {
    var newHTMLDocument = document.implementation.createHTMLDocument('preview');
    var el = newHTMLDocument.createElement('div').innerHTML = data;
    var $pageHTML = $(el);
    var $pageHero = $pageHTML.find('.page-section--hero');
    $('.page--next').append($pageHero);
}

Cool, huh? What’s cooler, it’s supported by all modern browsers, even in IE9+.

Related posts

We're hiring 👋