Subject: File types (was Re: msword doc bug)
From: Paul Rohr (paul@abisource.com)
Date: Wed Jan 19 2000 - 12:25:40 CST
[ This has segued into a design discussion, so I've pruned abiword-user from 
the cc list. ] 
There are a total of five discrete issues to be addressed here:
1.  What happens when saving a Word document  
--------------------------------------------
This is an issue for existing documents, where the user has two choices 
under the File menu:
  Save     -- try to resave in whatever format we imported it as
  Save As  -- choose a format, then save (defaults to import format)
For any format which we know how to import, but not export -- this is 
currently true only for Word, but the problem is more general -- our current 
behavior is user-hostile.  What we *should* do is notify the user that we 
can't save in that format, and either:
  - warn them we're switching formats (and let them cancel), or 
  - send them to the Save As dialog so they can pick a format themselves
Instead, what we currently do is *change* formats to our .abw default (which 
is defensible), but without telling them (ugh) and without changing the file 
type (double-ugh).  Which leads us to #2 and #4...
2.  Setting file types
----------------------
Any modern GUI OS has realized that it's a lot easier to generate a 
user-friendly desktop experience if you can double-click on files to open 
them in the right application.  To do this, though, the OS needs to know how 
to automatically associate each individual file with a particular 
application and/or file type. 
There are currently at least four different ways this is done, depending on 
the OS:
  MacOS     resource fork has magic cookies for filetype & creating app 
  Windows   each filename has a suffix, and the OS binds suffixes to apps
  BeOS      uses MIME types for this purpose, but I'm fuzzy on the details
  Unix      (none of the above)
Without devolving into a flamewar about which alternative is "better", the 
point is that it's important for us to reliably set any such 
platform-specific indication of file type in the appropriate way for that 
OS.  (On Windows, for example, that means adding the right suffix by 
default.)  
3.  Making AbiWord files double-clickable
-----------------------------------------
In addition, we also need to do some platform-specific work to register 
ourselves with the OS as being capable of handling double-clicks on those 
files.  For example, Jeff implemented this functionality for Windows in the 
following file:
  abi/src/af/xap/win/xap_Win32Slurp.cpp
I suspect further work will need to be done to generate similar 
functionality on our other platforms. 
4.  Sniffing file types
-----------------------
My claim is that it's not only OK, but actually preferable, to use some sort 
of native file type indicator (NFTI) to figure out how to interpret a file 
when opening it.  If that's what the OS told users, that's what we should 
try first, too.  
However, regardless of the OS, at some point the NFTI (suffix, resource 
fork, MIME type, or whatever) is either missing or wrong.  This problem is 
most pervasive on Unix, where the precedent of having NFTIs hasn't really 
gotten started. 
In this case, we need a fallback strategy for figuring out what's in the 
document so we can open it properly.  Those of us who used to write Web 
browsers for a living called this process "sniffing".  You actually open the 
file, look at some number of 100 bytes at the head of the file, and then 
guess which importer to use.  
Since our import/export architecture allows an arbitrary number of 
importers, any patches to implement sniffing should distribute that logic 
among each importer (instead of doing it all in one place).  For example, 
see how the current suffix-guessing logic gets implemented in:
  abi/src/wp/impexp/xp/ie_imp.c
An obvious way to implement sniffing is to open the file once and pass a 
copy of the first 10 or 100 bytes or so to each importer in turn via an API 
like the existing fpRecognizeSuffix().  Which leads us to #5...
5.  Make our format more sniffable
----------------------------------
Our native file format puts XML-style comments *before* the <abiword> tag 
which contains the contents of the document (which means it's quite a ways 
into the file). 
Since most simple sniffers don't look very far into a file, it'd probably be 
a Good Thing if we changed our exporter to put those comments immediately 
*after* the <abiword> tag instead. 
If we're using expat properly, I doubt that this should break file format 
compatibility in any serious ways for existing users.  In any event, since 
those comments get dropped by the importer and readded by the exporter, the 
workaround is trivial -- open the document in AbiWord and resave. 
bottom line
-----------
We'd welcome patches which address any of these five issues, but I'd like to 
suggest that people focus first on #1, which should be enough to fix 
Elizabeth's problem.  
Paul
This archive was generated by hypermail 2b25 : Wed Jan 19 2000 - 12:20:27 CST