Subject: Re: i18n of abiword -- combining characters
From: Paul Rohr (paul@abisource.com)
Date: Sat Jan 15 2000 - 13:57:10 CST
At 03:43 PM 1/14/00 -0800, Leonard Rosenthol wrote:
>At 1:14 PM -0800 1/14/00, Paul Rohr wrote:
>>1.  Character sequence normalization.  (reasonable)
>>---------------------------------------------------
>>Thus, there needs to be work done (probably at input time) to normalize
>>those sequences of combining characters, and perhaps ignore invalid ones.
>
>	If you use the standard OS input methods, they will handle 
>all this for you - in fact, they will also handle a number of other 
>input issues that are pretty complex for some languages (especially 
>CJK).
Of course, we'd love to take advantage of OS-level input methods wherever 
possible, but I'm less confident than you are that these will be sufficient 
in all cases.  
I'm eager to see the code that proves me wrong.  :-)
>>(Otherwise, the variant sequences will make features like spell-check
>>prohibitively unreliable.)
>
>	And also search & replace.   The whole "combined characters" 
>in Unicode issue is an interesting one, especially when doing things 
>like regular expression searches.
Yep.  That's another good reason for normalization.  
>>2.  Combining characters -- position.  (???)
>>--------------------------------------------
>>The current code assumes that every Unicode character will occupy one cell
>>of display space of a known width.  However, languages like Thai render
>>sequences of several characters into the same display cell.
>
>	Since Unicode only has a single code point for any valid 
>glyph, your input handler should be converting the multiple 
>characters into the new composite glyph value and then you only have 
>one character to display.
Some languages may indeed have code points for all the composite glyphs 
needed.  However, as far as I can tell, this is *not* true for Thai. 
  http://charts.unicode.org/Unicode.charts/normal/U0E00.html
As far as I can tell, the following combining characters need to be 
composited with one or more other characters at rendering time:
  0E31
  0E34 - 0E3A
  0E47 - 0E4E
Am I missing something here?  
>>4.  Combining characters -- rendering.  (???, platform-specific)
>>----------------------------------------------------------------
>>On each platform, someone will need to investigate whether the
>>text-rendering primitives know how to properly combine a character sequence
>>into a single glyph.  If so, drawing should be pretty easy.  If not, adding
>>logic to do all that rendering from the constituent glyphs in the font may
>>be difficult. 
>>
>	Again, if you use the single combined glyph code point, it 
>should work just fine when rendered.
Again, this sounds wonderfully convenient, but I'm not sure it's always
true.  
Paul
This archive was generated by hypermail 2b25 : Sat Jan 15 2000 - 13:51:51 CST