Subject: Re: Structured Storage [was: Re: Inline Images??? ]
From: Leonard Rosenthol (leonardr@lazerware.com)
Date: Sat Apr 21 2001 - 17:01:02 CDT
At 06:13 PM 4/20/2001 +0200, Hubert Figuiere wrote:
>gzip is a one file compression. This means that you have to uncompress the 
>whole file before using it. Tar has the linear approach of archive files 
>together. While it is simple and appropriate for some use (tar means Tape 
>ARchive, remember), it is almost unusable for a structure storage model.
>As a comparison, consider using Linux with a tape as main storage (instead 
>of a hard drive with random access).
         Hub hit it right on the nose as to the problems/issues with using 
tgz...I can't believe anyone with two brain cells to rub together would 
even consider it as a viable file format...(not to mention other issues 
like incompatible tar implementations (POSIX vs. GNU), etc.)
>ZIP is better than tar as it provides indiviudal compression inside the 
>archive, but its archive file format is kind old and not always the best 
>for this purpose. You have to scan thru the file to find things. I think 
>Leonard as a better overview of the problem :-)
         Because it's a "real" archive format, ZIP can indeed be used for 
this purpose PROVIDED that you either ignore or find workarounds for 
some/all of it's limitations.  For example, ZIP isn't truely hierarchical 
(hierarchy is faked via path strings, so you need to keep the entire 
catalog in memory and use fancy hashing), ZIP only supports ISO-Latin-1 
filenames, ZIP has extremely limited metadata support, ZIP can't 
address >2Gig files, etc.
         The most common solution to these problems is one that I pioneered 
while I was at Adobe is to use a "manifest" file (ala Java's JAR files, but 
in XML) to maintain the full complete metadata catalog of the archive and 
basically ignore ZIP's metadata system and filename limitations.  I believe 
that OO is taking a similiar approach, but I haven't actually looked at 
their stuff.
Leonard
This archive was generated by hypermail 2b25 : Sat Apr 21 2001 - 19:48:07 CDT