CourtListener.com – US Fed Appellate Court Alerts and Yet Another Legal Search Engine

A mention in the BeSpecific blog tipped me off to an interesting project called CourtListener.com. From the about page:

The goal of the site is to create a free and competitive real time alert tool for the U.S. judicial system.

At present, the site has daily information regarding all precedential opinions issued by the 13 federal circuit courts and the Supreme Court of the United States. Each day, we also have the non-precedential opinions from all of the Circuit courts except the D.C. Circuit. This means that by 5:10pm PST, the database will be updated with the opinions of the day, with custom alerts going out shortly thereafter.

The site was created by Michael Lissner as a Masters thesis project at UC Berkley School of Information.

A quick perusal of the site and its associated documents tells us that Michael is using a scraping technique to visit court websites looking for recently released opinions. Once found, the opinions are retrieved, converted from PDF to text, indexed, and stored. Atom RSS feeds are then generated to provide current alerts.

The site is powered by Python using the Django web framework and is open source, so you can download the code. The backend database is MySQL and search is handled by Sphinx. The conversion from PDF appears to be plain text. If you register on the site you can create custom alerts based on saved searches.

All in all CourtListener.com provides another good source for current Federal appellate court opinions. Be sure to check the coverage page to see how far back the site goes for each court. Perhaps the future will bring an expansion to more courts and jurisdictions.

Beware of Openwashing as “Open” Becomes the New Black

The old “open vs. proprietary” debate is over and open won. As IT infrastructure moves to the cloud, openness is not just a priority for source code but for standards and APIs as well. Almost every vendor in the IT market now wants to position its products as “open.” Vendors that don’t have an open source product instead emphasize having a product that uses “open standards” or has an “open API.”

“Openwashing” is a term derived from “greenwashing” to refer to dubious vendor claims about openness. Openwashing brings the old “open vs. proprietary” debate back into play – not as “which one is better” but as “which one is which?”

What does it mean to be open? And how can you tell if a product is really “open”?

via How to Spot Openwashing.

The article goes on to recommend paying close attention to licensing, the community, and a vendors proprietary products to see if their software and APIs are truly open source or just wrapped in a open blanket to take advantage of the latest buzz words.

Over the years I’ve seen a number of instances of openwashing, most notably with companies who built commercial products around a core of open source projects. The companies would make big noise about being open source and such, but community releases would just be a mash-up of other open source projects with the glue and features that comprised the real product they wanted to sell held back as proprietary.

So, buyer/developer beware. That open source based product that looks so cool may really just be a mirage.

Is The Great Amazon EBS Failure the Beginning of the End For Disk Abstraction?

The promise of network block storage is wonderful: Take a familiar abstraction (the disk), sprinkle on some magic cloud pixie dust so that it’s completely reliable, available over the same cheap network you’re using for app traffic, map it to any instance in a datacenter regardless of network topology, make it so cheap it’s practically free, and voila, we can have our cake and eat it too! It’s the holy grail many a storage vendor, most of whom with decades experience in storage systems and engineering teams thousands strong have chased for a long, long time. The disk that never dies. The disk that’s not a disk.

The reality, however, is that the disk has never been a great abstraction, and the long history of crappy implementations has meant that many behavioral workarounds have found their way far up the stack. The best case scenario is that a disk device breaks and it’s immediately catastrophic taking your entire operating system with it. Failure modes go downhill from there. Networks have their own set of special failure modes too. When you combine the two, and that disk you depend on is sitting on the far side of the network from where your operating system is, you get a combinatorial explosion of complexity.

Magical Block Store: When Abstractions Fail Us « Joyeur.

Fascinating piece on the perils of disk abstraction. Raises a very good question: Why do we worry about disks at all in the cloud? I wonder how many folks would just be tossing data into the cloud without the comfy metaphor of disk and machine to lean on?

Best Description of the Likely Cascading Failure That Took Out EC2

Let’s think of a failure mode here: Network congestion starts making your block storage environment think that it has lost mirrors, you begin to have resilvering happen, you begin to have file systems that don’t even know what they’re actually on start to groan in pain, your systems start thinking that you’ve lost drives so at every level from the infrastructure service all the way to “automated provisioning-burning-in-tossing-out” scripts start ramping up, programs start rebooting instances to fix the “problems” but they boot off of the same block storage environment.

You have a run on the bank. You have panic. Of kernels. Or language VMs. You have a loss of trust so you check and check and check and check but the checking causes more problems.

via On Cascading Failures and Amazon’s Elastic Block Store « Joyeur.

Closing in on 36 hours since this melt down began, Amazon has still not been able to restore all of the EC2 instances and EBS volumes that where knocked offline in the #SkynetMassacre. This article is the best explanation of what most likely happened. And the scary part is that it will happen again. And again.

Sadly, there is not a lot to do but try and build enough redundancy into your systems to survive this sort of thing. But it is likely that building that redundancy is going to bring about another melt down at some point. Guess I’ll just need to keep thinking about how to deal with this sort of thing.