Data Migration

I have been working with storage systems for different vertical markets for over two (2) decades.   Developed the first commercial storage system in which data objects are distinctive and referenced by a unique global identifier.  In a nutshell, client applications / servers submit an object to the storage server.  The storage server determines if the object is unique using a cryptographic hashing function (i.e., MD5).  If the object is not unique, it returns the global unique identifier (GUID) for the matching stored object.  If not in the storage system, it stores the object and returns the GUID.  Client applications / servers associate the GUID with the object in a database.  When the objects needs to be retrieved, client applications / servers present the GUID to the storage server.  In return it provides a copy of the stored object.  The big advantages of such system are that objects are not stored with paths that may change with time and objects can be replicated keeping multiple copies.

In the past few months I have read a few articles referring to migration.  The need for migration becomes obvious when a digital object has been stored and after many years the object is retrieved.  The storage system might be able to return bit by bit the original object, but the remaining question might be that the object cannot be viewed or executed because the software and / or hardware platform needed have become obsolete rendering the object useless.

I first read about this topic in a relatively recent issue of Communications of the ACM.  At the time I understood the issue but I was not able to fully comprehend the situation until a few weeks ago.  I am interested in reviving and publishing a boon about a software development methodology that I created several decades ago.  At that time I started writing a book using a Macintosh computer.  I used MacWrite, MacDraw and MacPaint.  For what I was doing such tools were more than adequate and simple to use.

On a separate but interest point, on the Preface section titled “How we produced this book”, there are a couple paragraphs that make reference how some of the same Macintosh tools were used by Thomas H. Cormen, Charles E. Leiserson, Ronal L. Rivest and Clifford Stein to write early editions of their textbook “Introduction to ALGORITHMS”.  Apparently they kept updating the contents of their book and they did not run into the issue of attempting to access a manuscript written with early versions of Macintosh software after a couple decades went by.

I ma quite passionate on distributed storage systems.  This is not a fad that came up with the newly coined term “big data” which appears to mean something different to every person that you talk to.  When storing small, medium or large objects, most of the issues tend to be similar.  It is when storing and retrieving very large objects from remote locations that specific and unique issue have to be contended with when storing “big data”. 

I have given the migration issue some thought.  Have not come with a perfect solution but given the state of the art, it might be relatively simple to associate metadata with every object in such a way that the software platform requires are listed.  The storage server could periodically retrieve and update to newer software versions objects that have been created with older software.  For example, if a storage system would keep track that a Word document was generated with Word 2007, then at some point in time it could be refreshed (read and write) it by using Word 2010 (if such thing exists).  The original object might be replaced or a newer version might coexist incase a discrepancy is noted in the future.  This operation could transparently occur as the system becomes aware of newer versions of the word processor in question.

At this time, my immediate interest is to be able to extract the text (and if possible the graphics) of the manuscript I wrote a few decades ago.  I was lucky to find a brief document by:

Kate MacGregor and Douglas Young

Decentralized Computer Services

401 Uris Hall

Cornell University

Ithaca, NY  14853

(607)256-4981

In the next few days will attempt to write a software utility to at least retrieve the text from the CDP manuscript.  Will update this blob entry as I deal with the issue.

The Naïve America

Be Sociable, Share!

Leave a comment

Your comment