Content Addressable Storage or Fixed Content Storage

wikipedia_logoThis morning while browsing the news I ran into the article “Who The Hell Writes Wikipedia, Anyway?” by Henry Blodget in the “Silicon Alley Insider” www.alleyinsider.com.  It describes how the information one finds in Wikipedia is entered and edited.  Wikipedia is presented to the web community as a repository of information, similar to an encyclopedia, in which you can read about any (well most) topic you might be interested in.  If you believe the data is incorrect, then you can edit it.  After being moderated, the flaw is corrected.  With time, this simple process should address all erroneous data.  In other words, if you read about a topic in Wikipedia that has not been edited in a while, you should assume it is correct.

Two items to consider with the Wikipedia approach are:  [1] the knowledge of the masses is not always correct and  [2] what the article touches on, most individuals might contribute some information, but very, very few would constantly maintain it.  As the referred article describes it “… The bulk of the changes to the original text, then, are made by a core group of heavy editors who make thousands of tiny edits (the 1400 freaks). …”

For an example of incorrect information in Wikipedia regarding CAS / FCS, in the period between 1990 and 1992 John Canessa while working with different Hierarchical Storage Management (HSM) software came up with (invented) a different metaphor for storing data.  The logic behind the technology is as follows.  Humans need cues and mechanisms to help them locate data in disk(s) attached to computers.  File systems allow users to organize data in an inverted tree structure starting from the root.  The leaves in the tree are the actual data files.  Each file in a file system has a unique file ID and occupies a well define location on disk.  For ease of use, humans limit the number of files in a folder.  The names of folders and files, including their extensions, are usually assigned mnemonics for the type of data they hold.  All of this is done in order to help us humans locate data files.

Computers are not bound by the same rules.  As far as a computer is concerned, all files in a file system could be in a single container / folder and given ascending numbers (i.e., block number).  The idea behind CAS / FCS is to have at the application level a unique name associated with the contents of the file.  The CAS / FCS could then implement a coat check metaphor in which the client presents a Global Unique Identifier (GUID) and the CAS / FCS system returns the associated data as a file or a stream.

Independent to storage, application layers already make use of database engines to keep track of metadata.  They have done this for decades since the invention of database engines.  The only change to support a CAS / FCS is to replace the field that holds the path to a file (e.g., a deed, invoice, radiology image, car loan application, the 05:00 PM news on May 22, 2008 among others), which could change in time (e.g., the disk was replaced for a NAS and given a different network path), for a GUID.  A GUID needs to contain more than a sequential number in order to allow it to be unique in multiple distributed Direct Attached Storage (DAS), Network Attached Storage (NAS) or Storage Area Network (SAN).

The reader should note that Content Addressable Storage (CAS) and Fixed Content Storage (FCS) are different acronyms for the same type of technology.  The CAS / FCS technology is intended to store data that does not change (fixed) in time.

In May 1992, then professor at the Laboratory for Computer Science at the Massachusetts Institute of Technology (MIT) Ron Rivest published the “The MD5 Message-Digest Algorithm” Request for Comments (RFC).  In a nutshell, a MD5 digest is a sequence of 32 bytes generated by a mathematical procedure performed on the actual contents of a data file or stream.  The MD5 algorithm when applied to the same data always returns the same digest.

The first commercial application of a CAS / FCS was named Diverse Storage Manager (DSM) by the inventor and Directed Storage Manager (also DSM) by his management.  It supported on-line and near-line media in the form of Magneto Optical Discs (MOD) in a Hewlett-Packard (HP) automated library.  The CAS / FCS was supporting a Digital Imaging and Communications in Medicine (DICOM) medical archive at the Radiology Society of North America (RSNA) 1993 show in Chicago, Illinois.

In 1994 John Canessa founded Software Engineering Corporation (SENCOR) www.sencor.com.sencor_logo

In 1995 John Canessa called a few times Leonard M. Adleman, at the time professor at Stanford University, one of the founders of RSA and contributor to the MD5 algorithm, to get his insights on the uniqueness of the MD5 digest.  Dr. Adleman responded as a true scientist by stating that, for all practical purposes, generating the same MD5 digest for two (2) different documents was mathematically highly improbable, but possible.  Because of this the CAS / FCS software products since then developed by John Canessa always had a unique identifier in addition to a hash value (i.e., MD5 digest).

In 1996 SENCOR spawned a marketing company by the name FileLink with the intent of selling products based on the SENCOR CAS / FCS technology.  FileLink and later SENCOR showed DICOM archives at RSNA boasting the second generation of CAS / FCS technology.

In March 1998 Ron Anderson wrote an article titled “Storing Smart Saves Space” in the now defunct Byte Magazine (www.byte.com).  The article described the operation and technical advantages of a CAS / FCS system.

In the late 1990′s FileLink was approached by EMC (www.emc.com) to license / purchase the rights to the CAS / FCS software Backup and Archive Manager (BAM).  Management was not able to reach a mutually satisfactory agreement.

In the late 1990′s Paul Carpentier and Jan van Riel, while working at a Belgian startup FilePool, coined the term Content Addressable Storage (CAS).

By the year 2000, SENCOR had implemented a few commercial products to process car loans, manage video for broadcasting, and managing video clips for investing purposes using CAS / FCS technology.

In 2001 EMC acquire FilePool whose main product became the Centera platform.

In May 2003 John Canessa submitted to the Network Working Group an Informational draft titled “Fixed Content Storage (FCS) Application Programming Interface (API)” (http://www.watersprings.org/pub/id/draft-canessa-fcs-api-00.txt).  At the time it represented the third generation of the CAS / FCS technology.

The Storage Networking Industry Association (SNIA) www.snia.org finalized the first pass on a C programming API for CAS / FCS using XML named “Information Management – extensible Access Method (XAM) – Part 2: C API Version 1.0 TECHNICAL POSITION July 9, 2008″.

John Canessa is about to start work on the fourth generation of the CAS / FCS technology.

The information here presented differs from what is currently posted on Wikipedia when one searches for “content addressable storage” (do not include the double quotes).  John Canessa will be updating Wikipedia in the next few days so readers can use this example to give objective weight to what was stated in the article referenced in this blog.

For some reason the Naive American tends to believe any and every thing that is printed.  When reading something, always check the source.  Never trust a single resource.  The driving force behind these blogs is to always be objective.  This of course is easily said than followed.  The Naive American is a subset of the human race.  For some reason humans tend to be easily persuaded by politicians and sales people.  Perhaps we act like that because we want to believe on what is being sold to us.  In the case of Wikipedia, we could always use it as the single source for all references.  This makes it easy on us because we can then avoid the time and effort it takes to fully research a topic.  The drawback is that it might not have the correct information.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • Live
  • Slashdot

Leave a comment

Your comment