From: Daniel Glassey (danglassey-abi_at_ntlworld.com)
Date: Mon Nov 10 2003 - 15:57:38 EST
Hi Tomas, working on a reply but just forwarding this to list for now
because you sent it from the address you aren't subscribed from.
Thanks,
Daniel
-----Forwarded Message-----
From: Tomas Frydrych 
Cc: abiword-dev_at_abisource.com
Subject: rendering API
Date: Sun, 09 Nov 2003 10:50:08 +0000
Hi Daniel,
> Define a 'decent' API ;)
Indeed, that has to be the first step before we start talking about 
using Pango, Graphite, Uniscribe etc. 
We need an abstract XP redering API such that any communication 
between our layout classes and a rendering engine  (RE) will be 
carried through this API, without exception. The big question is how 
to best encapsualte the RE in the light of how we process text. This 
is an outline of the AW processing that falls into the rendering 
category
A. Current processing overview
=======================
I. Piece Table: contains the raw Unicode text. The important aspect 
of PT for rendering is the fact that the text is stored in non-
sequential manner. The base class for the RE should provide API for 
PT access, i.e., shaper will typically require access to a string, we 
have to create the string from the PT data, something like
    RE::getUCS4StringFromDocPos(ucs4 * str, uint length, ....);
    RE::getUtf8StringFromDocPos(utf8*, ...); etc.
However, converting the PT contents to sequential strings is 
timeconsuming and needs to be kept to a minimum. It would be ideal 
for us if the external RE did not expect input in a sequential 
string, but rather would allow the user to provide text itterator 
hook instead, something like
    ucs4 getNthUCS4Char(uint n);
which the extrnal RE would use to itterate our PT.
II. Block Class: the block class splits text into runs and breaks it 
into lines. As long as we are talking about the RE only doing shaping 
and providing line-breaking info, the block as it stands does not 
needs access to the RE (the line-breaking info is passed down to the 
block from the text runs).
III. Line class: from rendering point of view, the line class is 
responsible for BIDI reordering of its runs. For now, we do this by 
directly accessing FriBidi from within the line class; this too 
should be encapsulated into the RE API; however, we do not reorder 
actual text, only the sequence of text runs, keeping runs uni-
directonal. We want something like
    RE::bidiReorder(const fp_Line * pLine);
Again, it would be ideal if the external RE did not require a text 
string, but would allow us to provide an itterator over character 
types, something like
    FribidiCharType getNthCharType(uint n);
IV. Text Run class
    a. finds suitable line break points within itself
    b. is responsible for shaping (glyph replacement, ligatures ...)
    c. is responsible for drawing
regarding (b) the Text Run instance caches the shaped string for 
future use (i.e., to avoid unnecessary shaping)
So the run class would need something like
    RE::findBreakPoint(const ucs4 *);
Again, we might find it usefull if instead of ucs4 string we could 
provide an itterator.
    RE::shape(...)
    RE::draw()
This is how it currently works, we may want to / have to adjust the 
above processing sequence to make using an external render easier.
B. Fallouts of the current processing
==============================
I. Unicode compliance.
------------------------------
a. There is a Unicode-compliance issue with doing BIDI reordering on 
lines, the Unicode algorithm does reordering on paragrahs. That is 
fine for reordering static text, but adds substantial overhead in a 
wordprocessor, particularly when the text is stored in a non-
sequential PT. However, if we continue doing reordering of runs 
rather than text, it would be possible to move the BIDI processing 
from fp_Line into fl_BlockLayout. Because of processing overhead I 
would only do this if it is either necessary to interface with 
external REs, or if it can be shown that reordering lines produces 
different sequence than reordering blocks. 
b. The Unicode shaping algorithm assumes linebreaking is done 
_before_ shaping, but this creates real problems in a WYSIWYG 
application because the unshapped text cannot be measured for width; 
we currently break after shaping.
II. Shaping limitations
-------------------------------
Because we shape & cache approach we currently use in fp_TextRun 
means that we can only shape using glyphs with explicit Unicode 
codepoints; there are languages, such as Syriac, where there is only 
one code point per letter, and the shaping depends on font technology 
(OpenType, etc.), and this does not work. Partial solution would be 
to chache glyph indices instead of chars and have graphics methods 
for drawing with indices, but it would only make sense if we 
interface with a shaper that can handle such advanced font 
technologies.
II. Interfacing with a RE
--------------------------------
As should be clear from the the above, a RE provides some 
functionality encapsulated in our graphics classes (drawing), and 
some in our layout classes (breaking, shaping). If we are to retain 
the current processing sequence, we need to be able to tap into the 
break, itemize, shape and render components of the RE individually 
from our XP code. While I reckon it would be easy to write a suitable 
abstract encapsulation of the required API, it might be difficult to 
actually implement it efficiently with a given RE (i.e., if the 
shaper assumes intermediate data in complex propriatary form, we will 
need to continually convert our internal data into its format and 
back). Also, the RE might not allow us to do only shaping, etc, 
because its internal algorithm is such that the various stages cannot 
be separated (I got the impression that this could be the case with 
the Graphite engine's little-by-little processing).
The other option is to change the internal processing to make it 
easier to interface with a RE (and preferably get the RE designers to 
meet us half way, i.e., the itterator hooks suggested above); after 
trying the other approach with Pango, I think, this is the only 
realistic way. The real question is to what extent this could be done 
without a specific shaper in mind; this takes us back to the 
portability question.
Tomas
This archive was generated by hypermail 2.1.4 : Mon Nov 10 2003 - 15:56:10 EST