Headless Firefox in Node.js with selenium-webdriver

As of version 56 (currently in Beta), Firefox supports running headlessly on Windows, macOS, and Linux. Brendan Dahl has previously described how to use SlimerJS to drive headless Firefox. You can also drive it via the W3C WebDriver API, and this blog post explains how to do that in Node.js with the selenium-webdriver package.

(For a similar introduction using Python on Windows, see Andre Perunicic’s Using Selenium with Headless Firefox.)

First, ensure you have a version of Firefox that supports headless. On Linux, the current release version (55) is sufficient. On Windows and macOS, however, you’ll need at least version 56, which is currently in Beta (scheduled for release next month). You can also use Developer Edition (based on Beta) or a Nightly build; any pre-release build will do.

Next, install geckodriver (and ensure it’s on your PATH). You can download and install it manually from the geckodriver releases page, or you can install it using NPM via npm install -g geckodriver or yarn global add geckodriver (mind node-geckodriver #30 on Windows). On macOS, you can also use Homebrew to install it via brew install geckodriver.

Finally, create a Node project, initializing it with your favorite package management tool and installing the selenium-webdriver package:

mkdir project-dir
cd project-dir
npm --yes init # yarn --yes init
npm install selenium-webdriver # yarn add selenium-webdriver

Now you’re ready to drive headless Firefox from Node scripts in your project.

For example, here’s how to create a script that searches for “testing” on the Mozilla Developer Network and takes a screenshot of the result. It uses features available only in Node 8, but scroll to the bottom for a reference to the equivalent for Node 6.

First, import some useful core Node methods:

const { writeFile } = require('fs');
const { promisify } = require('util');

Then import APIs from selenium-webdriver and selenium-driver/firefox:

const { Builder, By, Key, promise, until } = require('selenium-webdriver');
const firefox = require('selenium-webdriver/firefox');

Tell selenium-webdriver to disable its “promise manager” so we can use Node’s native async/await (which will become unnecessary when the promise manager is removed in selenium-webdriver #2969):

promise.USE_PROMISE_MANAGER = false;

Then create a Binary instance:

const binary = new firefox.Binary();

On Windows and macOS, if you have multiple versions of Firefox installed, configure it with the distribution channel (NIGHTLY, AURORA, BETA) to ensure you get the correct one:

const binary = new firefox.Binary(firefox.Channel.NIGHTLY);

On Linux, if you’d like to use a different version of Firefox than the one on your PATH, specify the path to the executable:

const binary = new firefox.Binary('/path/to/firefox');

Add the --headless argument to the binary:

binary.addArguments("--headless");

(Eventually selenium-webdriver #4591 will make this a driver configuration option.)

Then start Firefox with the Binary you previously created:

const driver = new Builder()
.forBrowser('firefox')
.setFirefoxOptions(new firefox.Options().setBinary(binary))
.build();

Finally, tell Firefox to load the Mozilla Developer Network home page, enter “testing” into its search form, hit the RETURN key to submit the form, await loading of the search results page, take a screenshot of the page, and save the screenshot data to a screenshot.png file in your current directory:

async function main() {
  await driver.get('https://developer.mozilla.org/');
  await driver.findElement(By.id('home-q')).sendKeys('testing', Key.RETURN);
  await driver.wait(until.titleIs('Search Results for "testing" | MDN'));
  await driver.wait(async () => {
    const readyState = await driver.executeScript('return document.readyState');
    return readyState === 'complete';
  });
  const data = await driver.takeScreenshot();
  await promisify(writeFile)('screenshot.png', data, 'base64');
  await driver.quit();
}

main();

That’s it!

For the complete script, along with a version that works on Node 6, see the headless-examples repository on GitHub. And for additional information on using selenium-webdriver, see the selenium-webdriver README, the API documentation, and this directory of example scripts.

Note: Updated on 2017 September 1 to specify headless mode using the --headless command-line argument rather than the MOZ_HEADLESS=1 environment variable.

 

Headless Firefox

Over in Headless SlimerJS with Firefox, fellowzillian Brendan Dahl writes about the work he’s been doing to support running Firefox headlessly. A headless mode for Firefox makes it easier to test websites with the browser, especially in continuous integration, to ensure Firefox remains compatible with the Web. It also enables a variety of other interesting use cases.

Brendan started with Linux, the most popular platform for CI services like Travis, and focused first on SlimerJS, a popular tool for testing websites with Firefox (and scripting the browser more generally) that uses Firefox to run a different XUL application (rather than running Firefox itself). Now he’s working on support for full headless Firefox as well as Windows and Mac.

Check out his blog post for more details and to tell him how you’d use the feature!

 

Introducing qbrt

I recently blogged about discontinuing Positron. I’m trying a different tack with a new experiment, codenamed qbrt, that reuses an existing Gecko runtime (and its existing APIs) while simplifying the process of developing and packaging a desktop app using web technologies.

qbrt is a command-line interface written in Node.js and available via NPM:

npm install -g qbrt

Installing it also installs a Gecko runtime (currently a nightly build of Firefox, but in the future it could be a stable build of Firefox or a custom Gecko runtime). Its simplest use is then to invoke the ‘run’ command with a URL:

qbrt run https://eggtimer.org/

Which will start a process and load the URL into a native window:

URLs loaded in this way don’t have privileged access to the system. They’re treated as web content, not application chrome.

To load a desktop application with system privileges, point qbrt at a local directory containing a package.json file and main entry script:

qbrt run path/to/my/app/

For example, clone qbrt’s repo and try its example/ app:

git clone https://github.com/mozilla/qbrt.git
qbrt run qbrt/example/

This will start a process and load the app into a privileged context, giving it access to Gecko’s APIs for opening windows and loading web content along with system integration APIs for file manipulation, networking, process management, etc.

(Another good example is the “shell” app that qbrt uses to load URLs.)

To package an app for distribution, invoke the ‘package’ command, which creates a platform-specific package containing both the app’s resources and the Gecko runtime:

qbrt package path/to/my/app/

Note that while qbrt is written in Node.js, it doesn’t provide Node.js APIs to apps. It might be useful to do so, using SpiderNode, as we did with Positron, although Gecko’s existing APIs expose equivalent functionality.

Also, qbrt doesn’t yet support runtime version management (i.e. being able to specify which version of Gecko to use, and to switch between them). At the time you install it, it downloads the latest nightly build of Firefox. (You can update that nightly build by reinstalling qbrt.)

And the packaging support is primitive. qbrt creates a shell script (batch script on Windows) to launch your app, and it packages your app using a platform-specific format (ZIP on Windows, DMG on Mac, and tar/gzip on Linux). But it doesn’t set icons nor most other package meta-data, and it doesn’t create auto-installers nor support signing the package.

In general, qbrt is immature and unstable! It’s appropriate for testing, but it isn’t yet ready for you to ship apps with it.

Nevertheless, I’m keen to hear how it works for you, and whether it supports your use cases. What would you want to do with it, and what additional features would you need from it?

 

Positron Discontinued

After some consideration, I’ve decided to discontinue development of Positron.

Positron was an experimental runtime for creating desktop apps using web technologies. It was based on Firefox, and its principal feature was that it was Electron-compatible. I started working on it—in conjunction with several colleagues—to enable Tofino to run on Gecko.

But Tofino is dead (long live the Browser Futures Group!), and Electron compatibility isn’t essential for a viable Gecko runtime. It’s also hard, since Electron has a large API surface area, is a moving target, requires Node.js integration (itself a moving target), and is designed for Chromium’s process architecture, which is substantially different from Firefox’s.

I’ve previously written about the utility of desktop runtimes (among other embedding projects). I still think they’re valuable for a variety of use cases, and Gecko can provide unique value to desktop application development. I’ll continue to pursue the realization of that value. I just won’t do it through Positron.

 

Embedding Projects

Last month I blogged about Why Embedding Matters, and then I described a variety of Embedding Use Cases. Here are some projects that would address those cases. If you could choose, which would you do first?

Embedding Framework for Headless Browsing

An Embedding Framework for Headless Browsing would enable internal Gecko testing frameworks, scriptable browsers like SlimerJS, and other tools that automate web page interactions to do so without having to display them in a visible window (or jump through hoops to avoid doing so).

It would be available for all shipping versions of Gecko in Firefox (Nightly, Aurora/DevEdition, Beta, Release). It would support all three major desktop OSes (starting with Linux, which is popular for testing websites in continuous integration). It might provide a command-line flag for running Firefox headlessly (as Chrome is planning to do).

Desktop Browser Runtime

A Desktop Browser Runtime is a specialized application runtime that would address the Hybrid Desktop Web Browser use case. It could also be used to build Site-Specific Browsers.

It would provide core OS APIs (file I/O, networking, process management, etc.) along with desktop integration (windows, menus, application lifecycle, etc.). It would also include an API to load and manage untrusted web content in isolated frames.

It might integrate Node.js using SpiderNode, both to provide core APIs and to enable access to an ecosystem of third-party modules. And it might support WebExtensions by default and provide a service (like the User-Agent Service in Tofino) for storing, querying, and retrieving browsing data.

WebView for Android and iOS

A WebView for Android would provide a Gecko equivalent to Android’s WebView class. It would enable Android app developers to build Hybrid Mobile Apps for Android as well as Native Android Apps that incorporate web content. A WebView for iOS would do the same thing as the WebView for Android project, but for iOS, to provide the equivalent of its WKWebView class.

Embedding Framework for Desktop

An Embedding Framework for Desktop would support web content rendering in native desktop apps, including application frameworks like the Desktop Browser Runtime. It might provide a measure of compatibility with existing APIs for embedding rendering engines, like the Chromium Embedded Framework. Alternately, it might support embedding of both Gecko and Servo.

 

Mozilla and Node.js

Recently the Node.js Foundation announced that Mozilla is joining forces with IBM, Intel, Microsoft, and NodeSource on the Node.js API. So what’s Mozilla doing with Node? Actually, a few things…

You may already know about SpiderNode, a Node.js implementation on SpiderMonkey, which Ehsan Akhgari announced in April. Ehsan, Trevor Saunders, Brendan Dahl, and other contributors have since made a bunch of progress on it, and it now builds successfully on Mac and Linux and runs some Node.js programs.

Brendan additionally did the heavy lifting to build SpiderNode as a static library, link it with Positron, and integrate it with Positron’s main process, improving that framework’s support for running Electron apps. He’s now looking at opportunities to expose SpiderNode to WebExtensions and to chrome code in Firefox.

Meanwhile, I’ve been analyzing the Node.js API being developed by the API Working Group, and I’ve also been considering opportunities to productize SpiderNode for Node developers who want to use emerging JavaScript features in SpiderMonkey, such as WebAssembly and Shared Memory.

If you’re a WebExtension developer or Firefox engineer, would you use Node APIs if they were available to you? If you’re a Node programmer, would you use a Node implementation running on SpiderMonkey? And if so, would you require Node.js Addons (i.e. native modules) to do so?

 

Embedding Use Cases

A couple weeks ago, I blogged about Why Embedding Matters. A rendering engine can be put to a wide variety of uses. Here are a few of them. Which would you prioritize?

Headless Browser

A headless browser is an app that renders a web page (and executes its script) without displaying the page to a user. Headless browsers themselves have multiple uses, including automated testing of websites, web crawling/scraping, and rendering engine comparisons.

Longstanding Mozilla bug 446591 tracks the implementation of headless rendering in Gecko, and SlimerJS is a prime example of a headless browser would benefit from it. It’s a “scriptable browser for Web developers” that integrates with CasperJS and is compatible with the WebKit-based PhantomJS headless browser. It currently uses Firefox to “embed” Gecko, which means it doesn’t run headlessly (SlimerJS issue #80 requests embedding Gecko as a headless browser).

Hybrid Desktop App

A Hybrid Desktop App is a desktop app that is implemented primarily with web technologies but packaged, distributed, and installed as a native app. It enables developers to leverage web development skills to write an app that runs on multiple desktop platforms (typically Windows, Mac, Linux) with minimal platform-specific development.

Generally, such apps are implemented using an application framework, and Electron is the one with momentum and mindshare; but there are others available. While frameworks can support deep integration with the native platform, the apps themselves are often shallower, limiting themselves to a small subset of platform APIs (window management, menus, etc.). Some are little more than a local web app loaded in a native window.

Hybrid Desktop Web Browser

A specialization of the Hybrid Desktop App, the Hybrid Desktop Web Browser is notable not only because Mozilla’s core product offering is a web browser but also because the category is seeing a wave of innovation, both within and outside of Mozilla.

Besides Mozilla’s Tofino and Browser.html projects, there are open source startups like Brave; open-source hobbyist projects like Min, Alloy, electron-browser, miserve, and elector; and proprietary browsers like Blisk and Vivaldi. Those products aren’t all Hybrid Apps, but many of them are (and they all need to embed a rendering engine, one way or another).

Hybrid Mobile App

A Hybrid Mobile App is like a Hybrid Desktop App, but for mobile platforms (primarily iOS and Android). As with their desktop counterparts, they’re usually implemented using an application framework (like Cordova). And some use the system’s web rendering component (WebView), while others ship their own via frameworks (like Crosswalk).

Basecamp notably implemented a hybrid mobile app, which they described in Hybrid sweet spot: Native navigation, web content.

(There’s also a category of apps that are implemented with some web technologies but “compile to native,” such that they render their interface using native components rather than a WebView. React Native is the most notable such framework, and James Long has some observations about it in Radical Statements about the Mobile Web and First Impressions using React Native.)

Mobile App With WebView

A Mobile App With WebView is a native app that incorporates web content using a WebView. In some cases, a significant portion of the app’s interface displays web content. But these apps are distinct from Hybrid Mobile Apps not only in degree but in kind, as the choice to develop a native app with web content (as opposed to packaging a web app in a native format using a hybrid app framework) entrains different skillsets and toolchains.

Facebook (which famously abandoned hybrid app development in 2012) is an example of such an app.

Site-Specific Browser (SSB)

A Site-Specific Browser (SSB) is a native desktop app (or simulation thereof) that loads a single web app in a discrete native window. SSBs typically install launcher icons in OS app launchers, remove or minimize browser chrome in app windows, and may include native menus and other features typical of desktop apps.

Chrome’s –app mode allows it to simulate an SSB, and recent Mozilla bug 1283670 requests a similar feature for Firefox.

SSBs differ from hybrid desktop apps because they wrap regular web apps (i.e. apps that are hosted on a web server and also available via a standard web browser). They’re also typically created by users using utilities, browser features, or browser extensions rather than by developers. Examples of such tools include Prism, Standalone, and Fluid. However, hybrid app frameworks like Electron can also be used (by both users and developers) to create SSBs.

Linux Embedded Device

A variety of embedded devices include a graphical user interface (GUI), including human-machine interface (HMI) devices and Point of Interest (POI) kiosks. Embedded devices with such interfaces often implement them using web technologies, for which they need to integrate a rendering engine.

The embedded device space is complex, with multiple solutions at every layer of the technology stack, from hardware chipset through OS (and OS distribution) to application framework. But Linux is a popular choice at the operating system layer, and projects like OpenEmbedded/Yocto Project and Buildroot specialize in embedded Linux distributions.

Embedded devices with GUIs also come in all shapes and sizes. However, it’s possible to identify a few broad categories. The ones for which an embedded rendering engine seems most useful include industrial and home automation (which use HMI screens to control machines), POI/POS kiosks, and smart TVs. There may also be some IoT devices with GUIs.