abiword-dev Mailing List Archive: Justification of Chinese (bug

From: Roland Kay <roland.kay_at_ox.compsoc.net>
Date: Sat Apr 09 2005 - 09:41:48 CEST

Hi,

I'm trying to put together a general solution for bug #8750
which is due to the fact that currently Western rules are
used to justify Chinese.

http://bugzilla.abisource.com/show_bug.cgi?id=8750

In fact, the rules in Chinese are very simple; you can, more
or less, add space to all characters. However, I want to
implement a patch that will solve the problem for Chinese
users without breaking things for other Asian languages.
That leads to a complication in handling punctuation.

Punctuation characters should get extra space if they're in
a Chinese context, but not in an English or Korean context.
This means that for "following punctuation" (e.g. a
full-stop or a closing bracket), whether we can add extra
space or not depends on whether the first character to its
left which is not punctuation is Chinese, English or Korean.
"Preceding punctuation" depends in the same way on
characters to the right.

In general, we should only have to look one or two
characters away from the character we are considering.
However, there is no fixed upper limit. Because of this it
seems very inefficient to do these calculations each time we
have to recalculate the justification.

For this reason, I'm trying to implement a system in which
the calculation is made when the characters are input and
stored for use later by the justification algorithm.

Following Tomas' suggestion in

http://www.abisource.org/mailinglists/abiword-dev/2005/Mar/0355.html

I'm trying to add this functionality to the RenderInfo
structure. Later this would also allow caching of each
characters line breaking properties which would also speed
up the line breaking algorithm.

The problem is that, as far as I can tell, the
GR_Graphics::shape() function just gets passed a string of
characters. There seems to be no way to know which ones are
new and which were already there the last time shape() was
called.

For example, if you start AbiWord and type "hello". On the
first call to shape() the ShapeInfo structure (si) contains
"h"; on the second, "he" and so on. At the moment I'm
getting around this by maintaining a bit in the bit field
for each character that says whether we've looked up its
properties yet. If this bit is not set then we know the
character is new. However, this doesn't solve the problem of
deletion. If in "AB," A is a Korean character and B is
Chinese then the properties of the comma should change if B
is deleted. I don't see anyway in shape() to detect that
this has occurred.

Does anyone have any idea of how this could be detected or
if there's a more appropriate place that this calculation
could be made?

Is the problem of justification also one which will be
ultimately resolved by the Pango graphics class?

Best wishes,

R.
Received on Sat Apr 9 09:44:43 2005

This archive was generated by hypermail 2.1.8 : Sat Apr 09 2005 - 09:44:43 CEST

Justification of Chinese (bug #8750)