Subject: Re: i18n of abiword -- charset mapping
From: Paul Rohr (paul@abisource.com)
Date: Fri Jan 14 2000 - 15:13:35 CST
These problems will need to be addressed for any language which doesn't use 
the Latin-1 charset.  
1.  Charset mapping on input. (easy)
------------------------------------
As you say, doing the math to map Thai keycodes (TIS-620) to their Unicode 
equivalents should be quite easy.  If you find that the current mechanism 
for this purpose is insufficient, let us know.  We're always interested in 
better approaches.  
2.  Charset mapping at rendering time.  (reasonable)  
----------------------------------------------------
Depending on the font being used for rendering, there may need to be a 
mapping back from Unicode characters to another charset.  We currently don't 
have a general mechanism in the code for this, and one will be needed for 
most non-Latin languages.  
The easiest way for users to work around this problem is to locate a font 
which stores the characters for their language of choice at the appropriate 
Unicode positions.  Note that this does *not* need to be a complete Unicode 
font (as those are still fairly rare).  
This suggests that it might be fairly simple to take existing *fonts* which 
are commonly used for a given language, and run them through a conversion 
process which re-encodes just those characters as a Unicode font instead.  I 
don't know if there are any existing font-conversion utilities for this 
purpose, but if not, it'd make a great project for someone.  :-)
general hint
------------
Insofar as we're likely to need a bunch of to/from conversions between 
Unicode and various charsets or codepages, it seems like we may need a new 
set of efficient mapping classes which can be used for both purposes.  As a 
quick-and-easy XP solution, the usual table-driven macro magic with header 
files ought to work nicely here, so long as we're willing to compile the 
tables into the code.  
Since these tables can be quite sizeable, it's probably worth debating the 
proposed approach and APIs on this list before implementing them.  
In any event, the following seems to be *the* definitive source of raw 
material for this purpose:
  ftp://ftp.unicode.org/Public/MAPPINGS/
For extra credit, it might be interesting to figure out how to change the 
mapping classes to demand-load the table contents at runtime from resources. 
Then all we'd need to do is figure out how to build the necessary 
platform-specific resources from these raw tables at build time.  
Paul
This archive was generated by hypermail 2b25 : Fri Jan 14 2000 - 15:08:16 CST