Subject: fields design -- FIELDs vs. CHUNKs (LONG)
From: Paul Rohr (paul@abisource.com)
Date: Fri Sep 22 2000 - 19:43:26 CDT
The hardest design issue I faced when thinking through potential fields 
implementation was the content model thing:
  How complicated can/should the contents of fields get? 
As Justin pointed out early on, if we radically restrict the content model 
for fields, that makes certain implementation strategies a lot easier.  For 
example, consider the following restrictions on the contents of a given field:
  (a).  just inline text -- no breaks or formatting
  (b).  inline text and formatting -- ie, add the C tag
  (c).  same as (b), plus some breaks (line, page, column)
  (d).  full generality -- also can have section or paragraph breaks
Likewise, you can complicate that enumeration by also allowing other, 
non-textual content, such as:
  - images
  - other fields
  - etc.
So, what should we do?  
do what RTF does?
-----------------
According to assumption #V, our mechanism will eventually need to be able to 
handle whatever RTF throws at us.  And that's where the problem comes in.  
As it turns out, the biggest problem is the difference between (c) and (d) 
above.  
Conceptually, the Word & RTF formats use a delimited stream approach where 
documents are a stream of characters which get "broken" by inserting various 
inline formatting directives, including section and paragraph breaks.  
(Indeed the UI behaves that way -- just look at the interaction between the 
Insert Break dialog and Show Paragraphs mode).
By contrast, the AbiWord file format is XML-based, so it more directly 
represents the "fact" that sections contain paragraphs which contain text 
(with optional formatting at each of the three levels).
Note that I'm not arguing which approach is better, just highlighting a 
difference.  (No flamewars, please.)  Usually, this difference is 
meaningless, but now it's not. 
why is this a problem?
----------------------
Fields, conceptually, are containers of auto-generated text.  The simple 
markup we've been planning to use for fields works just fine, so long as all 
the generated content for that field is formatted text.  For example:
  <p>
     ...
     <field type=... other=... args=...>
        ... <c> ... </c> ...
     </field>
     ...
  </p>
The type attribute tells what kind of field it is, and the contents can be 
updated and replaced entirely by using the information attached to the other 
args in a type-specific way.  This corresponds quite nicely with the church 
secretary's notion of simple fields like page number or the date and time 
the document was last printed or saved. 
If you squint hard enough, the RTF mechanism looks quite similar to this.  
However, Word/RTF has radically expanded the notion of fields so they can 
use this same mechanism to *also* generate large chunks of the document, 
which may include paragraph and section breaks.  For them, that's easy, 
because those breaks are just another character among the contents of that 
field.  
For us, it's not that easy.   If you look at our file format, a paragraph 
break looks something like this:
  ... </p><p> ...
Likewise, this is a section break:
  ... </p></section><section><p> ...
XML's nesting rules explicitly forbid content which looks like this:
  <p>
     ...
     <field>
        ... </p></section><section><p> ...
     </field>
     ...
  </p>
There are a variety of ways to deal with this issue, but since none of the 
simple fields we need now require this level of complexity, let's punt the 
issue as follows...
solution:  some RTF "fields" are really chunks instead
------------------------------------------------------
For our file format, let's define *two* tags, one for each concept.  (Yep, 
this breaks my design assumption #I, but I'm convinced it's the right way to 
handle it.)
1.  Define and implement the FIELD tag for all fields which can use a simple 
"inline" content model,such as DATE, TIME, and PAGE.  The necessary 
mechanism is block-relative, and should be pretty simple.  A document is 
likely to have lots of FIELDs.
  1a.  I expect (a), (b), and (c) are all implementable using this tag.  
  1b.  I don't know whether we need to allow FIELDs to also contain IMAGEs 
  or other FIELDs, but I'm sure we can figure that out as we go along. 
2.  Use a separate set of CHUNK tags (to be defined later) for more complex 
chunks of generated content, such as TOC or INCLUDE.  The necessary 
mechanism would be document-relative.  Most documents should not have very 
many CHUNKs (if any). 
  2a.  This is most useful for category (d).  
  2b.  I'm sure we'll want to allow CHUNKs to contain FIELDs, and perhaps to 
  allow CHUNKs to contain CHUNKs.  Again, implementation experience is the 
  key here.  
Note that both of these can still have a common UI, but under the hood we'll 
handle them differently. 
Paul
This archive was generated by hypermail 2b25 : Fri Sep 22 2000 - 19:36:56 CDT