Subject: word importer (was Re: commit -- Patch for HTML export (bug#461?))
From: Paul Rohr (paul@abisource.com)
Date: Sat Dec 04 1999 - 16:44:38 CST
At 01:32 PM 12/4/99 -0600, Justin Bradford wrote:
>On the Doc importer front:
>1. text-position will be coming very soon.
Cool.  Aside from the lack of toolbar icons, this is the only thing needed 
to make that whole row green.  (Kudos again to Luke for getting everything 
else in his initial patch.)
>2. I'm not sure what orphans and widows refers to, exactly.
Dumb formatting algorithms break paragraphs at the last line which happens 
to fit on the page (or in the column).  However, this can sometimes leave 
only a few lines on either side of the page break -- one case is called a 
widow, the other is an orphan.  Here are examples of the two cases:
      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx 
  
  -- calculated page break --
  xxxxxx xx
      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx 
  xxxxxx xx
      xxxxxxx xxxxx x xxxxx xxxx
  -- calculated page break -- 
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx 
  xxxxxx xx
These block-level properties tell the formatter to *not* leave widows or 
orphans when breaking a specific block.  (In this example, the first page 
would get broken a line earlier, and the second would be broken either a 
line earlier or later.)
Ask Eric for more details on the specific semantics, or just grep through 
the code to see how they're used.  
>3. tabstops means custom tab settings, right? If so, that's actually a
>"bug" as I have code to generate the tabs in the ruler.
Precisely.  As the Tabs POW mentioned, we've already specified the syntax 
for left, center, right, bar, and decimal tabs, and all but bar tabs work 
properly on the ruler.  This should be enough for you to confirm whether 
you've imported them properly.  
After that, then the primary bugs remaining are:
  - syntax for tab leaders, and   
  - a bunch of formatter support.
>5. columns is just support for multicolumn sections, right? I believe that
>works. 
That's what it should be.  Argue with Bob about whether it works or not.  I 
haven't seen a test case either way.  :-)
>6. I'm not quite sure what section-space-after is.
You can currently insert a "continuous" section break, which allows the next 
section to be on the same page.  This is useful if you want to change the 
number of columns on the same page.  For example you could have a page which 
looks like this:
      xxxxxxx xxxxx x xxxxx xxxx
  xxxxxx xxxxxx xxxxxxxx xxxxxx
  xxx xxxxxx xxxxxxxx xxxxxxx xxx
  xxxx xxxxxxx xxx xxxx xxxxxxx 
  xxxxxx xx
  
  -- explicit section break --
    xx xxx xxx    xxxx xxx xxx 
  xxx xxx xxxxx   xxx xxxx xxxxx
  xxxxx xxx xxx   xxxxxx xxx xx
  xxx xxx xxxx
  xxxx xxx xxxxx     xxx xxx xxx
  xxxxx xxx xxx   xxxx xxxx xxx
That property controls how much vertical white space should be put between 
those two sections on the same page. 
>Images, breaks, and fields are pretty easy (well, once you get an image
>buffer from wv). Mostly just requires implementing the special character
>handler.
Yep.  To do a good job of importing fields, you'll need to add more field 
types, though.  Our current set is quite anemic. 
>Styles are straight-forward, but require a bunch of annoyingly mundane
>code changes. Although, I guess I'm not sure what to do with styles from
>Word which do not have an AbiWord equivalent (ie. custom user styles). I
>can create new styles as I'm importing, right?
Exactly.  Just mimic what happens when abi/test/wp/Styles.abw gets imported, 
and you should be fine.  I'm pretty sure the existing APIs should be wide 
enough for you, but if you've got more info to pass, let me know.
The one caveat is that style lookups will fail if they're referenced before 
they're defined.  Just to be safe, the .abw format is laid out so we can 
load all user-defined styles before any document content.  
Paul
This archive was generated by hypermail 2b25 : Sat Dec 04 1999 - 16:39:35 CST