From: Nadav Rotem (nadavrotem@mail.ru)
Date: Tue Jun 03 2003 - 01:46:39 EDT
Using OTS and keeping the style info of the document;
At the moment when OTS is used the text is parsed into the internal 
ots structure of otsArticle that contains otsSentences where each
 sentence is a list of words (char *) ; The result is that fonts, sizes , 
titles and footnotes are lost since they are stored as plain text;
with the new proposed structure a pointer to the original styled data
structure will be kept within every sentence;
INPUT
The plug-in will have to re-implement  "ots_parse_file () " with a
few minor changes; 
when loading the sentence the parser will add to each sentence a 
pointer to the original sentence structure that abiword uses; OTS will 
ignore this pointer in the internal processing algorithem and 
will not change the data; 
The input module may also set hints for ots to use in structure grading 
such as "Is it a title" etc. This may be implemented with a pointer to a 
structure that holds information such as is it a "new paragraph?" , 
"title?" , "footnote?". armed with this info , ots will make better 
decisions of how to summarize the text; Its best that Abiword will detect 
that, since Ots has no styling info;
 
OUTPUT
The export module will have to be rewritten by abiword; 
Its as easy as the HTML.c or TEXT.c;  just loop through the list of 
sentences and if ots set the "selected" flag "on" then it should be 
returned to the program (or simply not deleted);
typedef struct
{
  GList *words;                 /* a Glist of words (char*) */
  glong score;
  gboolean selected;
  gint wc;
        
  void *style; <---- be a pointer to style information or 
                        a sentence structure
  void *structue; <-- be a pointer to info about this line , such as "is 
                      it a title?"
} OtsSentence;
Updated version of this doc should be found in here:
http://nadav.homelinux.org/data/ots_style.txt
This archive was generated by hypermail 2.1.4 : Tue Jun 03 2003 - 02:04:28 EDT