Sometime you need to get stuff from the web without that pesky browser getting in the way. Screenshots, testing, archives, scraping, and such often call for getting a page from the web and doing something with it. Automating this with a conventional browse is no fun. Linux folks have the advantage of wget and cURL running from the command line. Programmers and developers will use the language and libraries of their choice. But it is still a chore. Well, there may be an easier way.
I’ve found a couple of javascript libraries, PhantomJS and SlimerJS (yes, very ghosty!) that provide tools for building JavaScript that can manipulate web pages, effectively browsing the web without the browser. PhantomJS “is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.” SlimerJS “is useful to do functional tests, page automaton, network monitoring, screen capture, etc. SlimerJS is similar to PhantomJs, except that it runs Gecko, the browser engine of Mozilla Firefox, instead of Webkit.”
I’ve been looking for a way to generate screenshots of pages that I’ve generated shortened URLs for with my shortener figuring that it would be nice to have a browsable library of pages. Either of these libraries will do the trick.
You can find both on Github: SlimerJS on GH & PhantomJS on GH.