Free Law Reporter – My Roadmap

Now that the Free Law Reporter (FLR) has had a few weeks to settle processing Public.Resource.Org’s Report of Current Opinions (RECOP) XML feeds into valid HTML and making sure .epub ebooks is working (I even released the code on github), I thought it might be time to lay out where I see FLR heading in the coming months. Right now (5/17/11) you can visit FLR, search through slip opinions from over 60 state and federal jurisdictions, view the documents in HTML, download complete FLR volumes by jurisdiction as ebooks, and download all documents in a search result as an ebook.

Using this as a foundation, I will be adding several additional features to FLR over the coming months:

  • Advanced search and analysis features to tap the power of Solr
  • Create a library to provide a single point of access to the FLR volumes
  • Select specific documents from search results or browsing to add to custom ebooks
  • Increase the size of the corpus that makes up the Free Law Reporter
  • Edit selected documents from search results or browsing to create truly custom ebooks
  • Provide tools for the community to add value to the Free Law Reporter
  • Citations

The milestones that follow will occur in roughly the order they are listed, but there is no set time table for implementing these features. This is due primarily to the other development projects I have on my plate including the main CALI website, Classcaster, eLangdell, Legal Education Commons, and the CALIcon website.

As you will note a number of the features I have in mind for the FLR will require significant community involvement to really materialize. In this context, I see the community as law librarians, law faculty, and law school technologists with an interest in seeing open and unencumbered access to legal resources for everyone.

 

Advanced search and analysis features to tap the power of Solr – The index, analysis, and search features of the FLR are powered by Apache Solr. Right now I am exposing only a minimum of its potential to provide very basic searching of FLR documents. An advanced search option will provide Boolean operators, phrase and term proximity queries, sub queries, and date range queries. The facet search and “More like this” features of Solr will be exposed to provide drill-down capabilities and access to related documents. All of these will provide a much richer and more robust search environment for locating documents.

All of this development will be done in the open with the hope that the community will get involved in shaping how documents are indexed and analyzed, search is done, and results are displayed. Because Solr is an open source project, we have access to the complete inner workings of the engine. Imagine being able to tune Google specifically for legal resources or adjust WestLawNext to work better for law students and faculty. Those are the sorts of things that can be done with FLR because we have control over the system.

Create a library to provide a single point of access to the FLR volumes – With dozens of volumes being added to the FLR every week finding things becomes an issue. Right now search of the corpus returns links to volumes of the FLR that contain specific opinions allowing for the download of ebooks. That isn’t really helpful if all you want is to download the latest Iowa volume to your ereader. I will add a central library mechanism to track all of those volumes as they are created weekly. This work will be done using the Open Publication Distribution System (OPDS) Catalog specification which will generate feeds that can be consumed by various ereaders and will help locate and track FLR volumes. This OPDS feed will act as an interface that will allow community access to the FLR library. Using the OPDS feed, law libraries could add the Free Law Reporter to their local collections.

Select specific documents from search results or browsing to add to custom ebooks – Right now you can save the complete results from an FLR search as an ebook. While useful, this approach has drawbacks including the fact that all of the documents returned by your search may not ultimately be relevant to your search. I will add the ability to review documents returned in a search and select which documents get included in the ebook. That means that the custom FLR volumes you create will be more relevant to your needs. These custom volumes will be assigned a URL and saved in the FLR library so that they can be shared and downloaded again in the future.

This custom ebook feature will provide a way for  faculty and law librarians to assemble custom volumes of FLR documents that can be shared with students or added to a law library’s local collection. With a some work the community can create custom law reporters that are focused on a single topic.

Increase the size of the corpus that makes up the Free Law Reporter – Right now the FLR contains just the slip opinions from Carl Malamud’s RECOP feeds. That means it covers documents issued by over 60 state and federal jurisdictions since about January 1, 2011. This is a very limited scope for a project with this much potential. To expand the scope of the corpus, I plan on adding the approximately 1,000,000 other federal court opinions available on the Public.Resource.Org website. This will push the depth of the FLR collection to include many of the opinions in the Federal Reporter series. I will also add various other sets of documents that are available as HTML (or in XML that can be transformed) such as the U.S. Code to the collection. The addition of these documents will provide greater context for results found through the FLR search interface and more material that can be used to create custom ebooks.

While U.S. Federal material is relatively easy to obtain and incorporate into the FLR, state level material is more difficult to locate and add to the FLR. Certainly the RECOP feeds provide good access to state appellate court material from January 1, 2011 forward, but the backfile of state court opinions is harder to come by. Likewise state codes and statutes are often difficult to locate and are usually not available in a downloadable format. Community involvement will be the key to building out the state collections in the FLR. Law librarians are an excellent resource for locating state legal materials and I would encourage them to work with state courts and governments to  obtain access to downloadable opinions and codes that can be incorporated into the FLR.

Edit selected documents from search results or browsing to create truly custom ebooks – It follows that once you can select specific documents for inclusion in custom FLR volumes, you will want to be able edit those documents to highlight specific points and/or add commentary. Because the source documents for FLR volumes are HTML, I will be able to provide this feature as part of a process that will allow you to search or browse for documents, select those documents for inclusion in a custom ebook, edit those documents as you see fit, and add your own chapters to the ebook. Once the selection and editing is complete you will be able to save the volume and you will be provided with URL for the volume that you can share or use to download the ebook.

As with the simple selection and publishing features, this feature will provide a way for faculty and law librarians to assemble custom volumes of FLR documents that can be shared with students or added to a law library’s local collection. With a some work, the community can create things like annotated law reporters and statute books. Law faculty can create customized course materials for their students.

Provide tools for the community to add value to the Free Law Reporter – One of the major feature sets I plan to add to the FLR are tools that will allow the community to add value to the collections. For example, tools for adding head notes to a document, tagging a document, and adding commentary to a document. These will provide the community with the capability to enhance and extend the value of the FLR. We all need to get involved in making the Free Law Reporter into a resource that is of great value to students, researchers, and the public, a resource that provides free and unencumbered access to legal materials to those who need to learn about the law.

Citations – I have already been asked several times about how one would cite to the Free Law Reporter. My answer has been that right now I would not cite to the Free Law Reporter. The FLR currently only contains slip opinions that are available more easily elsewhere and any citation should be to the more easily available and recognizable source. I do realize that this is not a satisfactory answer. As the FLR grows it will need to be citable and that is very complicated problem. I have included unique identifiers and lots of metadata in the documents added to the FLR so far. What I would like to see happen is that we talk about this and take the opportunity presented by a new law reporter published in a new medium to figure out the best way to create citations for the FLR. I would suggest using the FLR discussion forum for this.

 

This is where I see the Free Law Reporter headed over the coming months. The FLR project is important because it is intended to create  a resource that provides free and unencumbered access to legal materials to those who need to learn about the law. It is important because it will provide a way for a community of law librarians and faculty to come together to create this valuable resource.

Disclaimer – The Free Law Reporter is a CALI project. This roadmap is where I would like to see the  FLR go and it is not intended to commit CALI to any particular direction on the project.

 

CourtListener.com – US Fed Appellate Court Alerts and Yet Another Legal Search Engine

A mention in the BeSpecific blog tipped me off to an interesting project called CourtListener.com. From the about page:

The goal of the site is to create a free and competitive real time alert tool for the U.S. judicial system.

At present, the site has daily information regarding all precedential opinions issued by the 13 federal circuit courts and the Supreme Court of the United States. Each day, we also have the non-precedential opinions from all of the Circuit courts except the D.C. Circuit. This means that by 5:10pm PST, the database will be updated with the opinions of the day, with custom alerts going out shortly thereafter.

The site was created by Michael Lissner as a Masters thesis project at UC Berkley School of Information.

A quick perusal of the site and its associated documents tells us that Michael is using a scraping technique to visit court websites looking for recently released opinions. Once found, the opinions are retrieved, converted from PDF to text, indexed, and stored. Atom RSS feeds are then generated to provide current alerts.

The site is powered by Python using the Django web framework and is open source, so you can download the code. The backend database is MySQL and search is handled by Sphinx. The conversion from PDF appears to be plain text. If you register on the site you can create custom alerts based on saved searches.

All in all CourtListener.com provides another good source for current Federal appellate court opinions. Be sure to check the coverage page to see how far back the site goes for each court. Perhaps the future will bring an expansion to more courts and jurisdictions.

Is The Great Amazon EBS Failure the Beginning of the End For Disk Abstraction?

The promise of network block storage is wonderful: Take a familiar abstraction (the disk), sprinkle on some magic cloud pixie dust so that it’s completely reliable, available over the same cheap network you’re using for app traffic, map it to any instance in a datacenter regardless of network topology, make it so cheap it’s practically free, and voila, we can have our cake and eat it too! It’s the holy grail many a storage vendor, most of whom with decades experience in storage systems and engineering teams thousands strong have chased for a long, long time. The disk that never dies. The disk that’s not a disk.

The reality, however, is that the disk has never been a great abstraction, and the long history of crappy implementations has meant that many behavioral workarounds have found their way far up the stack. The best case scenario is that a disk device breaks and it’s immediately catastrophic taking your entire operating system with it. Failure modes go downhill from there. Networks have their own set of special failure modes too. When you combine the two, and that disk you depend on is sitting on the far side of the network from where your operating system is, you get a combinatorial explosion of complexity.

Magical Block Store: When Abstractions Fail Us « Joyeur.

Fascinating piece on the perils of disk abstraction. Raises a very good question: Why do we worry about disks at all in the cloud? I wonder how many folks would just be tossing data into the cloud without the comfy metaphor of disk and machine to lean on?

Best Description of the Likely Cascading Failure That Took Out EC2

Let’s think of a failure mode here: Network congestion starts making your block storage environment think that it has lost mirrors, you begin to have resilvering happen, you begin to have file systems that don’t even know what they’re actually on start to groan in pain, your systems start thinking that you’ve lost drives so at every level from the infrastructure service all the way to “automated provisioning-burning-in-tossing-out” scripts start ramping up, programs start rebooting instances to fix the “problems” but they boot off of the same block storage environment.

You have a run on the bank. You have panic. Of kernels. Or language VMs. You have a loss of trust so you check and check and check and check but the checking causes more problems.

via On Cascading Failures and Amazon’s Elastic Block Store « Joyeur.

Closing in on 36 hours since this melt down began, Amazon has still not been able to restore all of the EC2 instances and EBS volumes that where knocked offline in the #SkynetMassacre. This article is the best explanation of what most likely happened. And the scary part is that it will happen again. And again.

Sadly, there is not a lot to do but try and build enough redundancy into your systems to survive this sort of thing. But it is likely that building that redundancy is going to bring about another melt down at some point. Guess I’ll just need to keep thinking about how to deal with this sort of thing.

The Report of Current Opinions: Santa Comes Early to the Open Law Movement

Public.Resource.Org will begin providing in 2011 a weekly release of the Report of Current Opinions (RECOP). The Report will initially consist of HTML of all slip and final opinions of the appellate and supreme courts of the 50 states and the federal government. The feed will be available for reuse without restriction under the Creative Commons CC-Zero License and will include full star pagination.This data is being obtained through an agreement with Fastcase, one of the leading legal information publishers. Fastcase will be providing us all opinions in a given week by the end of the following week. We will work with our partners in Law.Gov to perform initial post-processing of the raw HTML data, including such tasks as privacy audits, conversion to XHTML, and tagging for style, content, and metadata.

via The Report of Current Opinions – O\’Reilly Radar.

On Sunday Dec. 19 Carl Malamud made the startling announcement quoted above. And you did read it correctly: “The Report will initially consist of HTML of all slip and final opinions of the appellate and supreme courts of the 50 states and the federal government. ” To say that this is huge would be the understatement of the year.

From personal experience I can tell you that the “slip and final opinions of the appellate and supreme courts of the 50 states and the federal government” have never all been freely available in HTML before. Not even close. At best you could probably wrangle 75% of these opinions in PDF using a mountain of code to scrape sites and parse feeds. To have all this available as a single feed is a game changer.

As a researcher and builder of tools for legal research and education, having access to a single feed that contains all of this data is just the thing I’ve been looking for (and occasionally trying to build) for the past 15 or so years. I have no doubt that the availability of this feed will spark a flurry of development to use the data in new and interesting ways. I will certainly be incorporating it in the CALI tools I’m currently working on.

Of course there are a couple of caveats here. First, we haven’t seen the feed yet. It won’t be available for a few weeks, so right now I’m still just waiting to see what it will look like. Second, there are 2 “timeouts” built into this service, direct government involvement by July 1, 2011 and a general sunset of private sector activity in creating the feed at the end of 2012. The timeouts underscore the belief that providing free and open access to primary legal materials is a duty of the government, plain and simple. As citizens we are bound to follow the law and our government should be obligated to provide us with free and open access to that law.

I know I’m certainly looking forward to a new year that brings greater free and open access to the law. Thanks, Carl.

Finding Spam on Amazon’s Mechanical Turk

At this point, Amazon Mechanical Turk has reached the mainstream. Pretty much everyone knows about the concept. Post small tasks online, pay people cents, and get thousands of micro-tasks completed.
Unfortunately, this resulted in some unfortunate trends. Anyone who frequents just a little bit the market will notice the tremendous number of spammy HITs. (HIT = a task posted for completion in the market; stands for Human Intelligence Task.) Test if the ads in my website work”. “Create a Twitter account and follow me”. “Like my YouTube video”. “Download this app”. “Write a positive review on Yelp”. A seemingly endless amount of spam HITs come to the market, mainly with the purpose of spamming “social media” metrics.

via Mechanical Turk: Now with 40.92% spam. – A Computer Scientist in a Business School.

Article points out that spammers tend to pay too much and only assign one HIT per request. Comments reveal that workers on MT can be relatively sophisticated in detecting spam, often wary of requests that seem too good to be true. So, if you’re thinking about using Mechanical Turk to get some work done, keep in mind that the request should offer a reasonable fee and include multiple HITs.

Robert Douglass on Solr and other Search Back Ends For Drupal

Apache Solr is a powerful and flexible mechanism for performing site search on a Drupal site. Join us as we talk with Robert Douglass about all things Solr in Drupal, including new features and functionality and future development plans. Also, as a bonus, you will hear Robert use the word “de-baconify” in the context of Solr and Drupal.

Acquia Podcast 16: Robert Douglass on Apache Solr and other Search Back Ends Acquia.

This podcast covers mush that is going on with Drupal, Solr, and search in general. Lots of good, current information.

WordPress 3.0 Released, Includes Merge of MU Features Into WordPress Core

WordPress 3.0, the thirteenth major release of WordPress and the culmination of half a year of work by 218 contributors, is now available for download (or upgrade within your dashboard). Major new features in this release include a sexy new default theme called Twenty Ten. Theme developers have new APIs that allow them to easily implement custom backgrounds, headers, shortlinks, menus (no more file editing), post types, and taxonomies. (Twenty Ten theme shows all of that off.) Developers and network admins will appreciate the long-awaited merge of MU and WordPress, creating the new multi-site functionality which makes it possible to run one blog or ten million from the same installation.

WordPress › Blog » WordPress 3.0 “Thelonious”.

So, now the big question is how it changes WordPressMU. As those of you who follow along at home know, I’ve been busy of late with moving CALI’s Classcaster podcasting and blogging network over to WordPressMU. I think I will ewait a bit for the reviews come in on how the upgrade from MU to WP3 goes before I jump off that cliff. I’ve included a lot of plugins in Classcaster, so how all of them react to WP3 is going to be an issue.

Most likely I’ll take a stab at upgrading this blog first, after CALIcon, and then think about Classcaster. Of course any additional change to Classcaster will need to be done by mid-July so it is all ready for the Fall 2010 semester.

Update on the National Inventory of Legal Materials

Now there are 195 volunteers across the country working on federal and state level inventory projects, as it is now a full-fledged activity of the American Association of Law Libraries.  This project marries very nicely with AALL’s continued leadership and advocacy on  topics ranging from permanent public access to authentication to official status of online legal materials.   Much of this work draws and builds upon the fine work of the AALL Electronic Legal Information Access and Citation Committee.

National Inventory of Legal Materials – Bits and Pieces « Legal Research Plus.

Article provides updates on what is going on with the NILM. There is a round table discussion of NILM scheduled for 6/25/10 at CALIcon.

Unboxing the iPad

Here are some pics I took while unboxing the shiny and new iPad.

Brand new, still shrink wrapped

P0022931.png

This is the iPad, NIB, along with the official Apple iPad Holder and the always necessary AppleCare extended service warranty.

Box open, first view

P0022932.png

Once opened, the iPad was just sitting right on top of the box.

The new iPad, back view

P0022933.png

Lifting out the iPad, the first impression is the weight of it. It has some heft to it. Here we see the back, still swaddled in it’s plastic protector.

Free of packaging and plugged in

P0022934.png

With all of the plastic and box out of the way, I plugged the iPad into the MacBook Pro. That launched iTunes and the game was afoot.

The officai Apple holder for the iPad

P0022935.png

Once I made sure it was alive, I disconnected the iPad and slid it into the holder. A snug fit, but fashionably black.

Booting up!

P0022939.png

This is the first sync. I thought it went well, but it turned out that iTunes on the MBP decided it was a 2gb iPod, not a 32gb iPad. Irritating, but not fatal. Just started over.

And away we go…

P0022945.png

So, this is the very first screen of the iPad. App icons are added to subsequent screens.