An Experiment in Document Conversion and Generation

This is the README file for the Github repository that holds the files used and created in this experiment. I’m including the README in its entirety since it kills 2 birds with 1 stone.


1. Introduction

This repo holds a set of files that I created as an experiment in getting old work out of proprietary formats. The idea is to take a MSFT Word file and convert it into something that is human readable, open formatted, and convertible.

To do this is I settled upon AsciiDoc to mark up the text of the paper. I chose AsciiDoc over Markdown because of the depth of features and availability of conversion tools.


2. The Process

I decided to use a local install of Etherpad Lite (EL) as my primary text editor for this project. I did this because of a few features including autosave, versioning, and the potential for real time collaboration. I hoped that these features would provide me with a useful editing tool.

Once EL setup and configured I was faced with the problem of how to get the text of the paper into the editor in the first place. My initial inclination was to retype the document, formating and editing as I went along. Faced with a 10,000 word doc and no appreciable typing skills, I was not happy with this option. After a bit of poking around in EL I found its import features. To get MSFT Word files imported required a bit more configuring, but it worked. I then imported the Word file into EL.

The import process added the text of the document to the editor. It stripped all of the formatting from the text and inserted the 112 footnotes in-line into the text. All of this was actually a good thing, making the process of marking up the doc with AsciiDoc easier. Using the original word processing file as a guide I worked through the document adding the necessary AsciiDoc markup to format the paper. The most tedious part was the 112 footnotes, but since AsciiDoc handles footnote with in-line markup it moved along as fast as could be expected.

In total I spent about 6 hours working on the AsciiDoc version of the document. The most time was spent tagging footnotes and figuring out the format for the bibliography
[I am still not really pleased with the way the biblio looks. I think I can fix though on a later iteration.]
The rest of the formating such as section titles, quotes, emphasis, and lists was straight forward though I did keep a copy of the AsciiDoc User Guide open in another tab to help out.

I found the Etherpad Lite interface easy to work with and really appreciated the autosave and versioning features. EL doesn’t know about AsciiDoc markup though so that presented a challenge. In order to preview the work I had to export the file as text and then do the basic AsciiDoc to HTML, opening the resulting file in another browser tab to see what was going on. As I became more confident of my work, I checked less often so this was not much of an issue. I marked major revisions as saved revisions at the end of section of the document to give me a nice clean revision history.

Once I had a nice clean version that produced good HTML, I exported a final copy to my local computer and set about using the AsciiDoc utility a2x to generate the document in various formats. For this particular experiment I went with XHTML, PDF, and EPUB. The generation/conversion process was marred only by my problems with understanding the format for the bibliography at the end of the document. Once I figure out just how to mark up the bibliography process was flawless. a2x first converts the AsciiDoc marked document into a DocBook XML file and then converts the DocBook file into other formats. The process uses the standard set of XML processing tools as well as CSS to generate the files. By using custom CSS files, the layout and formating of the various output files can be changed as needed.


3. The Files

The files included in this repo are the ones used and generated as part of the process described above.

KELSOFIN20130111.docx The MSFT Word file that was used for the starting point. This document began as a WordPerfect file in 1992 and was moved to Word in the mid-90’s.
KelsoPaper.txt This is the AsciiDoc version of the file as created and edited in Etherpad Lite. This is the file used to generate the other formats.
KelsoPaper.pdf PDF file generated from KelsoPaper.txt using the command a2x -v -f pdf KelsoPaper.txt
KelsoPaper.html XHTML file generated from KelsoPaper.txt using the command a2x -v -f xhtml KelsoPaper.txt
docbook-xsl.css CSS file used to style KelsoPaper.html
KelsoPaper.epub EPUB file generated from KelsoPaper.txt using the command a2x -v -f epub KelsoPaper.txt

4. Conclusion

I am happy with the results of this experiment and hope to be able to further explore the use of Etherpad Lite and AsciiDoc as a tool set for creating free and open documents.

Notes from Drupalcamp Atlanta 10/27/12

These are my notes from dcATL.
  • Josh Clark @globalmoxie
  • The mobile future
  • Mobile is a new platform. What do we do with the new platform?
  • How do we do more with mobile?
  • Sensors give us super powers.
  • Mobile provides the opportunity to interpret the environment, think of augmented reality.
    • Think of ways to use camera and audio in classroom, like prof mentions case and it pop ups on device.
  • Table Drum app usess augmented audio.
  • AnyTouch turns everyday objects into interface objects.
  • Leap Motion moves touch interface into 3d space, natural gestures.
  • Natural gestures are the next break through in interfaces.
  • We need to design for natural gestures.
  • Windows 8 is intended to work with any input interface. Hugely challenging.
  • Medical field is using all sorts of special sensors with mobile devices to drive data collection.
  • Personal sensors make sense of our environment.
  • But we don’t need more operating systems, interfaces.
  • Remote control is an answer.
  • Ambiguous control among devices is coming, think of phones in cars. Your car rings. When you park the car, the interface follows you. Migrating interface.
  • http://bit.ly/day-glass– A day made of glass from Corning.
    • One smart device somewhere that is driven by ambiguous interfaces
  • Wii U
  • Grab Magic http://bit.ly/grab-magic
  • http://bitly.com/proto-gestures
  • Sifteo cubesare social toys.
    • Download software as it needs it.
  • Web is just in case, everything is loaded in case we need it. Needs to move to just in time, software loaded when we need it.
  • Passive interfaces just work on their own, doing the things they need to do to perform the functions they are designed to do.
  • Devices will get both dumber and smarter.
  • Metadata is the new art direction – Ethan Resnick @studip101
  • A cloud of social devices
  • Look beyond the interface, beyond the device, the presentation to the content and the services.
  • Push sensors
  • Think social not FB
  • Your ecosystem
  • We’re all cloud developers
  • Mind your metadata
  • New input methods
  • The future is here
  • Eric Webb @erikwebb
  • See slideshare
  • Evaluating modules
    • Supported version, maintainer rep, usage, # of open issues, usage over time.
    • Record before and after install using Devel module
    • Search for tag ” performance ” to weed out general issues.
    • What to look at
      • When does it run?
      • How does it scale?
      • What if it fails?
      • Does my site care?
      • Do I need this module?
    • ID the problem
    • Where problems occur
      • Page building like views and panels
      • External web services
      • Overall complexity
        • Views in panels in panels….
      • Misconfigured components
    • Keep records, establish a metric, adopt a definition of done, don’t hide behind infrastructure
  • Types of caching
    • App level caching is not really configurable. Tings like menus, forms
    • Component level caching, user facing stuff like blocks, views, panels
      • Best to speed up for authenticated users
    • Page level caching is important mostly for anon users
  • Configuring Drupal

  • Randall Kent @randallkent rkent@sevaa.com
  • http://bit.ly/dcatl-services
  • Web services as the tip of the iceberg.
  • REST is the key to getting at the stuff in Drupal. REST is one way to create an API on Drupal.
  • REST
    • built on http
      • GET, POST, PUT, DELETE
    • Client/Server
      • Separates ui from data storage
    • Stateless
      • All info necessary to process request must be included in the request itself
    • Cacheable
    • Layered
    • Uniform interface
  • /myapi/node – gets XML
  • /myapi/node.json – get JSON
  • REST console for Chrome
  • http://github.com/randallkent
    • DrupalREST.php
    • DrupalREST.net
  • See http://drupanium.org
  • David Bassendine @dbassendine
  • Open data, social, business tools
  • Few modules for consuming services
  • Always start with looking on line for a module
  • REST vs SOAP
  • Get to know the API you are working with
    • URL and path structure
    • Testing in browser for GET, POST requires extension/plugin
  • Services client for D7 will consume Services from another Drupal instance
  • REST API and Query API handle some RESTful APIs that serve json
    • See red mine module for example
  • Core HTTP API for other services
    • drupal_http_request($url,$options(headers,methods,data))
    • Slightly diff D6 & D7
  • Last 2 require custom modules to do the work
  • Krumo – http://krumo.sourceforge.net/
  • Talking to Web Services – Resources

  • Matthew Connerton @connerton
  • AJAX allways for there fresh of data in the browser page with refreshing the whole page.

    Sample code for AJAX in Drupal7
  • Replaces AHAH, which is a good thing. Pulls lots in crooks stuff
  • “use-ajax” class
    • drupal_add_library(drupal.ajax) to get Ajax in.
    • Pulls jquery in
  • $form[‘#ajax’]
    • drupal_add_library(drupal.ajax) to get Ajax in.
    • Blur is the default trigger.
  • It’s may ease the pain of the auth code stuff.
  • Check Drupal API for AJAX Framework docs.
    • includes/ajax.inc
  • Using #states in Form API
  • Ctools modal to open modal boxes for editing and such.
    • “ctools-use-modal” class
  • Doug Vann dougvann.com
  • Module filter is cool
  • DraggableViews
    • Makes rows of views draggable
    • Can be rearranged by drag and drop
    • Has AJAX
    • No relationship required
    • Could use this to provide a sort on Lesson topics based on order in the topic grid
    • Use this to rearrange stuff on the topic list view itself on the home page
    • No subsets or at least not easily handled
  • Nodequeue
    • Collect nodes in an arbitrary order
    • Requires relationship in order to bring stuff into proper scope


U of Minnesota Releases “Cultivating Change in the Academy”, Highlights Future of the Book

This collection of 50+ chapters showcases a sampling of academic technology projects underway across the University of Minnesota, projects that we hope inspire other faculty and staff to consider, utilize, or perhaps even develop new solutions that have the potential to make their efforts more responsive, nimble, efficient, effective, and far-reaching. Our hope is to stimulate discussion about what’s possible as well as generate new vision and academic technology direction. The work underway is most certainly innovative, imaginative, creative, collaborative, and dynamic. This collection of innovative stories is a reminder that we are a collection of living people whose Land Grant values and ideas shape who we serve, what we do, and how we do it. Many of these projects engage others in discourse with the academy: obtaining opinion or feedback, taking the community pulse, allowing for an extended discourse, and engaging citizens in important issues. What better time to share 50+ stories about cultivating change than in 2012 – the 150th anniversary of the founding of the Land Grant Mission!

via University of Minnesota Digital Conservancy: Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines at the University of Minnesota in 2012.

Produced in just 10 weeks, this book is a snapshot of academic technology projects and research underway at the University of Minnesota. Of more interest to me than the speed with which it was produced or the subject matter are the formats in which the book was released. First, it is a blog and a website. Each chapter is a post with the text of the chapter embedded as a PDF file. The blog has commenting enabled, RSS feeds and its own Twitter hashtag, #CC50, so that readers may engage the authors in ongoing discussion.  Second, the work is available in EPUB, .mobi, and PDF formats so you can read it on the platform of your choice. The work carries a Creative Commons Attribution- NonCommercial- ShareAlike 3.0 Unported License.

As I’ve stated in a prior post I think the future of books, especially textbooks and other educational materials lies on the web, not locked into some closed or crippled format. This book serves as an excellent example of the future of the book.

Media Commons white paper examines future of transparency in peer review

The always-insightful Alex Reid has penned an essay “on the question of open peer review,” which examines a draft white paper posted to Media Commons last week. The paper—Open Review: A Study of Contexts and Practices—struggles, Reid argues, to address a critical question: “What is the problem with existing scholarly review procedures that the open review process seeks to solve?”

via New Media Commons white paper examines future of transparency in peer review | opensource.com.

 

CALI’s Looking For a Sys Admin, Here’s A Brief History

Usually I might not be too keen to lose some of my job responsibilities, but in this case I couldn’t be happier. CALI is adding a systems administrator to wrangle all our servers, more than 20 at last count, on a full time basis. Since I started working at CALI 9 years ago my time has been split between web/database/cool project development and administering CALI’s servers and systems.

Back in 2003 that meant riding herd on an aging Windows NT server, a Win2K server handling some video streaming, and a couple of dark servers whose futures where not yet set. Of course the servers where in Chicago and I was in Atlanta. Things changed rapidly. The dark servers where brought online running Linux and our production web and storage systems where built out on the LAMP stack. Within a couple of years I added 3 servers at Emory in Atlanta to handle the increased demand for CALI services and resources online.

It wasn’t very long before we were struggling with large spikes in demand that were taxing our servers and we needed a better solution. Simply increasing the amount of hardware we owned wasn’t really an option since we were borrowing space and bandwidth from the law schools at Kent and Emory. At just the right time, Amazon Web Services came along and CALI jumped into the cloud.

Moving our web infrastructure to the AWS cloud gave us tremendous flexibility at a reasonable cost. After some trial and error I was able to configure a load-balanced web cluster that could be scaled up and down as demand for CALI resources and services flowed over the course of an academic year. Using the cloud meant that I could provision some services on their own servers so that things like Apache Solr and Asterisk could stand alone. As a result of the move to the cloud, by the beginning of 2011 I found myself administering 15 to 20 servers in the cloud alone (exact numbers depended on the time of year) plus another half dozen physical servers in 2 geographically dispersed locations.

All that sounds like a full time job itself, but that was only half the job. While all that infrastructure was being built out I was also developing 3 different versions of the CALI website, the Classcaster phone-to-blog system, a couple of iterations of eLangdell, the Free Law Reporter, and dealing with various other projects. Working on these development projects is what I really enjoy, but they often get pushed aside since I need to keep the servers running as a priority.

Now CALI is hiring a systems administrator to take over (or clean up) the running of our infrastructure. I’m looking forward to handing the keys of the cloud over to someone else so I can focus on all of the great projects that are in the pipeline. When can you start?

Details on the CALI sys admin job, which is located in our Chicago office, are at http://cca.li/6J.

Tricking out the iPad

So, I’ve added a little bluetooth keyboard to the iPad with the idea that I will use it more if I can actually use it. One of the first thing I noticed after getting it all paired up is that the chunky keyboard that fills the bottom of the screen is pleasingly gone. I like that.

I wonder if I can get a mouse?
The keyboard is certainly seems like it will be a useful feature. I still need to reach up and touch the screen to navigate, but typing is a lot more enjoyable.

In case anyone is wondering I went with a separate keyboard and a small carry case for the iPad rather than one of those portfolio style keyboard+case things. After looking at a few of those I just didn’t think they would work so well when using the iPad mainly as a reader, which is what I do. After a few minutes, that seems like the right choice.

I’m wondering how running the bluetooth radio is going to effect battery life on the iPad. I’ve been paired and typing for about 15 minutes so far and the battery indicator says I’ve run off 5% of the charge. I plan on turning off bluetooth and the keyboard when I’m not writing,so that should help.

Well, I’ll update this later after I’ve had more time with the keyboard.

New Version of Sigil EPUB Editor To Have WYSIWYG Editor

The forthcoming 0.6.0 version of Sigil, my favorite desktop EPUB editor, is going to have a WYSIWYG HTML editor in the BookView. This is a much needed addition to a great tool that will allow for greater control over the editing and creation of EPUBs.

From Making epub happen:

The next release of Sigil is shaping up nicely. There is so much going into it that the next release will be 0.6.0. Unfortunately, EPUB 3 will not be one of the features making it into 0.6.0. One major change coming will be a new BookView (BV) editor. Here is an unfished preview of what it might look like.

This is only a concept preview of the new editor. One issue that needs to be resolved is the double tool bar. I haven’t decided yet if I’m going to use the one in the BV pane or the global one in the window itself

 

The Future of The (Case)Book Is The Web

Recently there has been an explosion of advances in the ebook arena. New tools, new standards and formats, and new platforms seem to be coming out every day. The rush to get books into an “e” format is on, but does it make a real difference? The “e” versions of books offer little in the way of improvement over the print version of the same book. Sure, these new formats provide a certain increase in accessibility over print by running on devices that are lighter than print books and allow for things like increasing font size, but there is little else. It is, after all, just a matter of reading the same text on a screen of some sort instead of paper.

Marketers will tell you that the Kindle, Nook, iPad, and various software readers are the future of the book, an evolutionary, if not revolutionary, step in reading and learning. But that does not ring true. These platforms are really just another form for print. So now beside hard cover and paperback, you can get the same content on any number of electronic platforms. Is that so revolutionary? Things like highlighting and note taking are just replications of the analog versions. Like their analog counterparts, notes and highlights on these platforms are typically locked to the hardware or software reader, no better than the highlights and margin notes of print books. These are just closed platforms, “e” or print, just silos of information.

Unlocking the potential of a book that is locked to a specific platform requires moving the book to an open platform with no real limits like the web. On the web the the book is suddenly expansive. Anything that you can do on the web, you can do with a book. As an author, reader, student, teacher, scholar, anything is possible with a book that is on the open web. The potential for linking, including external material, use of media, note taking, editing, markup, remixing are opened without the bounds of a specific reader platform.  A book as a website provides the potential for unlimited customization that will work across any hardware platform used.

Turning a book into a website is not all that difficult. The EPUB standard is widely used for ebooks and is essentially a website in a box. EPUB files are basically ZIP files,  a zipped  collection of XML and HTML files. Typically the XML describes the book and its contents and the HTML holds the content of the book. Unzipping and EPUB file provides a predictable set of files and folders that can be processed into a static website. Once that website is created, the entire realm of possibilities of the web are available.

Law professors could start with an eLangdell casebook, expanding the EPUB version into a website then use a straightforward set of tools to edit that website. They could rearrange the text, add or remove cases or commentary, include a syllabus, link to additional materials like journal articles or websites, and more. Then they could save the website as an EPUB file that can be distributed to students replacing the costly and limited traditional casebook.

Let’s say you are a law student. Your professor assigns an eLangdell casebook, which means you could download a free, Creative Commons licensed EPUB version of the book, possible customized just for your class.  You could use that book on any number of devices or software programs. Any notes or highlights would be locked to that device or software program. Imagine if you could take the copy of the book, which you own, and expand it into a website. With some simple editing tools you could edit the book. Then you would be free to rearrange sections to match your syllabus, add notes, highlight text, add your class notes, link to recorded lectures, link to important cases, or share your work with classmates. You could even print it all out.  When you are done, you would save your personal copy of the book as an EPUB file. Since EPUB format is a container it would make sense to use it to store both the plain content of a book and the personalized version of a book that you own. Because it is on the web you could access it from any web browser on any device that you happen to be using.

The future of the book is the open web, not some platform silo. Only putting books on the web will unlock the potential of books and it is easy enough to do.

Blitz – Recommended for Drupal Testing

In the spirit of continuous integration and making load testing a fun sport, our paid plans have just two simple dimensions: How much you want to scale out and how long you want to rush. We do not impose any limits on how many times you run the load tests.You can easily increase your concurrency from 250 to 750 users on the free plan simply by inviting your friends to blitz. Well add +25 referral credits to both of you when your friend signs up!

via Blitz – Making load and performance testing a fun sport.

GIve it a try at http://blitz.io/gcb7uX6lS5oXx.