Subject: Re: AbiWord Chinese version of Linux
From: Paul Rohr (paul@abisource.com)
Date: Thu Mar 30 2000 - 19:46:44 CST
At 10:30 AM 3/27/00 +0800, hj wrote:
>    Top level window not support XIM. But s_ic and s_ic_attr must be static
>member. It will cause segment fault if I change to non-static. I don't know
>why.
>    All Chinese and English Characters are encoded in unicode in abw.
>European languages are not encoded in unicode. In furture we display
>different languages in one document. So unicode encoding is needed.If you
>replace fonts.hj with european languages, Characters are unicode in abw.
>    Chinese font files are too large to ship. I don't distribute Chinese
>fonts. I create a file "fonts.hj" in AbiWord font file that include Chinese
>printing font name, XLFD, printing font ascent, printing font descent and
>printing font width.
>    All unixfonts are created as fontset not font. It can display both
>English and Chinese Character. Printing program can print both English and
>Chinese Character.
>    We must resolve that keyval will be 0xffffff when I input Chinese with
>XIM. Chinese strings are stored in string not in keyval.
Thanks for the patch.  I'm very very impressed at how you've tackled issues 
throughout the tree to get Chinese working for you on Linux.  My goal now is 
to figure out how to integrate the work you've done with the work that will 
be needed to add true Unicode support for other languages and/or platforms.  
At this point, I'd like feedback from other developers in the following two 
areas:
  - people working on related i18n issues (Henrik Berg, Vadim Frolov)
  - a random GTK expert or two
As soon as we've got some consensus that you all are heading in the same 
direction, we can start getting some or all of this code checked in. 
To get the discussion rolling, here are some observations (in no particular 
order):
0.  do you have a screen shot?
------------------------------
I'd totally love to *see* your version running.  
1.  UI translation
------------------
It's really cool to see that you've already translated most of the UI.  I'm 
presuming that the hex-encoded characters map directly to the appropriate 
Unicode characters, and not some other charset, right?  
  src/wp/ap/xp/ap_Menu_LabelSet_Languages.h 
  src/wp/ap/xp/ap_Menu_LabelSet_ZhCN.h 
  src/wp/ap/xp/ap_TB_LabelSet_Languages.h 
  src/wp/ap/xp/ap_TB_LabelSet_ZhCN.h 
  user/wp/strings/ZhCN.strings 
How bad was it to do all the editing to generate an 8859-1 encoding of the 
strings file?  Would it have been easier for you to use one of expat's other 
supported encodings instead?  
  http://www.jclark.com/xml/expatfaq.html
For example, you can directly export UTF8 files from AbiWord.  :-)
2.  XIM on frame
----------------
Thanks for digging out the GTK apis for XIM support.  Is there anything we'd 
need to know to make these changes work for other languages besides Chinese?  
  src/af/xap/unix/xap_UnixFrame.cpp 
  src/af/xap/unix/xap_UnixFrame.h 
Also, could you elaborate on what problems you were seeing with non-static
ICs?  
Perhaps someone else on the list might be able to help. 
3.  coding style
----------------
It looks like there are a number of places where you added files and/or 
functions, all of which had your initials as a prefix.  Do you want your 
code to stand out like this, or was that just to make it easier to read the 
patch?  
(We generally tend to try to write code so it all blends in together.  That 
way, you have to use Bonsai's cvsblame tool to see who was responsible for a 
given line of code.) 
4.  files to ignore
-------------------
I noticed that there were a bunch of files in your patch which included 
changes which probably shouldn't be checked in.  For example, 
  src/af/xap/Makefile 
  src/af/xap/unix/xap_UnixDlg_About.cpp 
In addition, a bunch of spurious diffs were generated by RCS_ID variations.  
(Does anyone know of an option to suppress these?)
5.  some languages don't ever get spell-checked
-----------------------------------------------
I also noticed that you've implemented quick hacks to avoid spell-checking 
chinese content. 
  src/text/fmt/xp/fl_BlockLayout.cpp 
  src/wp/ap/xp/ap_Dialog_Spell.cpp 
Is there a more general way to do this check?  Do we want to explicitly tag 
content by language (via the lang attribute), or will it be enough to just 
ignore certain Unicode ranges?  
6.  pairing unrelated fonts
---------------------------
This one's going to sound pretty ignorant, so please forgive me.  
I'm not sure I completely understand why you've implemented the logic to 
pair up English and Chinese fonts as if they were the same font (as far as 
the UI is concerned). 
  src/af/xap/unix/xap_UnixFont.cpp 
  src/af/xap/unix/xap_UnixFont.h 
  src/af/xap/unix/xap_UnixFontManager.cpp 
  src/af/xap/unix/xap_UnixFontManager.h 
  src/af/xap/unix/xap_UnixPSGraphics.cpp 
  src/af/xap/unix/xap_UnixPSGraphics.h 
I'm used to using WYSIWYG editors, where users choose to use one font at a 
time, switching to others as needed.  Any time you use a character which 
isn't provided in that font, you get a slug character.  
From what little I know of fontsets, the idea is that you explicitly 
assemble a collection of overlapping fonts and give that *set* of fonts a 
name.  IIRC, GTK has mechanisms to do this, but I'm not sure whether that 
helps you much, since you have to generate PS output, too.  
(It's bad enough to do a 1-to-1 WYSIWYG mapping between screen fonts and 
printer fonts.  Mapping collections of fontsets sounds like a nightmare.)
Again, my goal here is to understand how to take what you've done and use it 
to solve similar problems for other languages.  
7.  multibyte / wide character conversions
------------------------------------------
I suspect that this stuff is likely to be the most controversial.  There are 
a number of places in the code where you've introduced locale-specific 
variants of UCS <--> char conversions via mbtowc() and wctomb(). 
  mbtowc
  ------
  src/af/ev/unix/ev_UnixKeyboard.cpp 
  wctomb
  ------
  src/af/gr/unix/gr_UnixGraphics.cpp 
  UCS <--> char (via wc/mb) 
  -------------
  src/af/util/Makefile 
  src/af/util/xp/Makefile 
  src/af/util/xp/hj.cpp 
  src/af/util/xp/hj.h 
  src/af/util/xp/hj_mbtowc.cpp 
  src/af/util/xp/hj_mbtowc.h 
  src/af/util/xp/hj_wctomb.cpp 
  src/af/util/xp/hj_wctomb.h
  src/text/fmt/xp/fp_TextRun.cpp 
  src/wp/ap/unix/ap_UnixDialog_Replace.cpp 
  src/wp/ap/xp/ap_EditMethods.cpp 
To be honest, I'm not sure how this approach compares to the iconv-oriented 
stuff which Henrik and Vadim have been working on.  I'm sure you're each 
working on real problems, but I frankly don't understand enough about what 
any of you are doing to be able to judge the merits of each approach.  
Could the three of you start a discussion to help get ignorant Americans 
like me up to speed?  ;-)
8.  should plain text be anything other than ASCII?
---------------------------------------------------
On a similar note, it looks like you've extended a bunch of logic which 
currently reads Latin-1 files to also handle other encodings, albeit in a 
locale-specific way.  
  src/af/xap/xp/xap_Strings.cpp 
  src/wp/ap/xp/ap_Strings.cpp 
  src/wp/impexp/xp/ie_exp_Text.cpp 
  src/wp/impexp/xp/ie_imp_MsWord_97.cpp 
  src/wp/impexp/xp/ie_imp_Text.cpp 
This makes me kind of nervous, because it means that the actual contents of 
the files being read and written are interpreted as being in different 
charsets, depending on your locale settings at runtime.  
Up until now, we've been striving to create totally-portable files, which 
are always in the same encoding no matter where you read or write them.  
(Thus, for example, note how we've differentiated 7-bit text files from UTF8 
text files.)
bottom line
-----------
You've obviously put a lot of hard work into this patch, and I really really 
want to be able to start bragging about the fact that we support Chinese on 
at least one platform.  That's *so* cool!
To be honest, I'm not sure that all of the issues I've mentioned above are 
actually real.  However, at the moment, I don't know enough to be able to 
decide how much of this patch to integrate into the tree.  
Could the various folks working on i18n issues help clear up some of my 
confusion here?  
Thanks,
Paul
This archive was generated by hypermail 2b25 : Thu Mar 30 2000 - 19:41:13 CST