Google Reminds Us About Open Source OCR Engine

This particular OCR engine, called Tesseract, was in fact not originally developed at Google! It was developed at Hewlett Packard Laboratories between 1985 and 1995. In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. However, shortly thereafter, HP decided to get out of the OCR business and Tesseract has been collecting dust in an HP warehouse ever since. Fortunately some of our esteemed HP colleagues realized a year or two ago that rather than sit on this engine, it would be better for the world if they brought it back to life by open sourcing it, with the help of the Information Science Research Institute at UNLV. UNLV was happy to oblige, but they in turn asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)… We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source.

Google Code – Updates: Announcing Tesseract OCR

Blogged with Flock

HyperScope Arrives, Serves Up OPML

The HyperScope is a high-performance thought processor that enables you to navigate, view, and link to documents in sophisticated ways. It’s the brainchild of Doug Engelbart, the inventor of hypertext and the mouse, and is the first step towards his larger vision for an Open Hyperdocument System.

HyperScope

This is great stuff.  Basically it extends OPML in a number of useful ways and provides a mechanism for viewing OPML files in a browser.  This will be useful as I work on eLangdell.

technorati tags:,

Blogged with Flock

Blogs Added to LexisNexis Index

LexisNexis has added blog content to its Newstex database. They’ve picked an interesting and eclectic mix of blogs covering business, computers & technology, financial services, government & politics, marketing, medical & health, and media. Check out the full list of blogs on Newstex (pdf).

BoleyBlogs! » LexisNexis Adds Blog Content- Paul L. Boley Law Library

Wow.  LN licenses a blogging indexing service from a closed-source blogging aggregator.  Nice.  According to the press releases (here and here) Newstex “licenses influential blog content directly from independent bloggers and then takes in each carefully selected blog feed in text format and uses its proprietary NewsRouter technology to scan it in real-time”.  In turn Newstex sells this tagged info to corporate rubes to slow to have figured out that they can get pretty much the same thing for free from Google or have their IT department whip something together over lattes at Starbucks.  Sometimes I do believe I’m in the wrong ned of the business:)

technorati tags:, ,

Blogged with Flock