INTRODUCTION ------------ Kterm is a modified version of xterm that is capable of displaying text from character sets requiring 2-bytes per character as well as the standard single byte character sets. The original kterm was designed to support display of Japanese text. This capability has been expanded to include Chinese and Korean as well. CHARACTER SETS AND CODINGS -------------------------- Version 4.1.2 of kterm can display Chinese, Japanese, and Korean text in a number of coding systems. With the exception of the Korean N-byte coding, all of the coding systems described below require two bytes per character. 1. Chinese A. GB2312-1980 (GuoBiao) PRC standard GB is a seven bit standard that requires two bytes per character. It is most often used with the high (most significant) bit set on each byte of the character to distinguish the Chinese text from other seven bit text. The eight bit usage of GB is also used in CCDOS, the Chinese version of MS-DOS. NOTE: Perhaps the eight bit usage should be refered to as EUC (Extended Unix Code). CODE RANGE: 0xA1A1-0xFEFE B. Shift-GB Shift-GB is a mixed seven and eight bit coding, with the first byte always having the high (most significant) bit set to distinguish it from other seven bit text. Shift-GB was used by the Chinese Macintosh OS until recently. NOTE: I'm not sure if it is an official standard. CODE RANGE: 0x8140-0xAFFC (excluding 0x7F as a second byte) C. Big5 Big5 is a mixed seven and eight bit coding, with the first byte always having the high (most significant) bit set to distinguish it from the other seven bit text. Big5 is at least a de facto standard in places like Hong Kong and Taiwan where the Traditional Chinese ideographs are used. NOTE: Rumor has it that it is, or will be a standard in Taiwan. I don't have any facts on this yet. CODE RANGE: 0xA140-0xF9FE 2. Japanese A. JIS (Japanese Industrial Standard X0208-1983) JIS is a seven bit standard that is usually distinguished from other seven bit text by a starting and ending escape sequence. START ESCAPE SEQUENCE: $B (NEW-JIS) @B (OLD-JIS) END ESCAPE SEQUENCE : (B CODE RANGE: 0x2121-0x7E7E B. Shift-JIS Shift-JIS is a mixed seven and eight bit coding, with the high (most significant) bit of the first byte set to distinguish it from the other seven bit text. CODE RANGE: FIRST BYTE : 0x81-0x9F and 0xE0-0xEF SECOND BYTE: 0x40-0xFC (excluding 0x7F) C. EUC EUC is an eight bit usage of JIS, with the high (most significant) bit of each byte set to distinguish it from other seven bit text. CODE RANGE: 0xA1A1-0xFEFE 3. Korean A. KSC5601-1987 (Jamos and Hangul) This version of kterm only supports the Jamos (Hangul elements) and Hangul portion of the KSC5601-1987 standard. The Hanja portion will come later. KS is a seven bit standard that requires two bytes per Hangul character. It is most often used with the high (most significant) bit set on each byte of the character to distinguish the Korean text from other seven bit text. NOTE: Perhaps the eight bit usage should be refered to as EUC (Extended Unix Code). CODE RANGE: JAMOS : 0xA4A1-0xA4FE HANGUL: 0xB0A1-0xC8FE B. N-byte N-byte code is a way of representing Hangul text using only ASCII characters. It uses a variable number of bytes to select a particular Hangul syllable and is distinguished from other seven bit text by the SO (Shift Out) sequence and the SI (Shift In) sequence. START ESCAPE SEQUENCE: ^N (0x0E) END ESCAPE SEQUENCE : ^O (0x0F) CODE RANGE: 0x41-0x7C (full range) NOTE: The code range actually varies. See the file "hgutil.c" for details. 4. X11 Compound Text Version 4.1.2 of kterm now recognizes most of the Compound Text approved standard encodings. It does not recognize the non-standard character set encodings or the directionality indicators. Even though the approved standard encodings are recognized, this is no guarantee that they will display text appropriately, specifically the right-to-left encodings. Code will have to be added to support this. The 94^N Compound Text sequences for GB 2312-1980, JIS X0208-1983, and KS C5601-1987 will be interpreted correctly if the appropriate language is chosen when starting kterm, or if it is set in the application defaults file, KTerm.ad. FONTS ----- There are a number of freely available Chinese, Japanese and Korean X11 fonts available. Here are some anonymous ftp sites where the fonts are available: 1. HOST: crl.nmsu.edu [128.123.1.14] CRL has a relatively complete collection of the freely available Chinese, Japanese, and Korean X11 fonts. They are located in the subdirectories pub/chinese/fonts, pub/japanese/fonts, and pub/korean/. The CRL site also has lists of known anonymous ftp sites for software related to the language of interest. 2. HOST: miki.cs.titech.ac.jp [131.112.16.39] HOST: utsun.s.u-tokyo.ac.jp [133.11.11.11] These ftp sites have large collections of many Usenet and JUNET newsgroup archives. The fj.sources archives contain many of the Japanese X11 fonts that have been posted on JUNET. There are Index files in most of the directories describing which archive file has the font sources. 3. HOST: kum.kaist.ac.kr [137.68.1.65] There are a few Korean utilities available from this site as well as archives of a number of Usenet news groups. Most of the Korean related code and fonts are located in pub/hangul/. AUTHORS AND CONTRIBUTORS ------------------------ The initial conversion work on xterm for displaying Japanese text was done by kagotani@cs.titech.ac.jp (Hiroto Kagotani). The ANSI color support was added using the kterm 4.1.0 patches provided by mukawa@tn-sec.ntt.junet (Susumu Mukawa). The Multi-Byte Character Set Word Select feature was added using a modified version of Kiyoshi KANAZAWA's 4.1.0 MBCS_WSEL patches. The Chinese and Korean support was added by mleisher@nmsu.edu (Mark Leisher). CLOSING NOTES ------------- The {character set,font set,language,conversion} mechanisms are a little clumsy and should eventually be modified to be more in line with XPG3 locale specifications and the up-coming X11 i18n specifications. Hopefully, this won't be too far away. BUG REPORTS ----------- Please send bug reports and/or fixes for kterm 4.1.2 to mleisher@nmsu.edu or mleisher@nmsu.bitnet. THANKS ------ I would like to express my thanks to Mr. Kagotani for doing the initial conversion work. His code made it a lot easier for me to add support for Chinese and Korean. Thanks go to Ricky Yeung and F. F. Lee for making their Chinese code conversion programs freely available. I would also like to thank ujsung@solgai.kaist.ac.kr (UnJae Sung) for having the patience to answer my questions about Korean coding. And last but not least, thanks go to these people for significant bug reports and fixes: John Melby of Fujitsu Martin C. Fong of Sybase Yang Zhiwei of the German National Research Center for Computer Science Alton Harkcom (for help updating the Japanese manual page) Sat May 4 14:11:37 1991 Internet: mleisher@nmsu.edu Bitnet : mleisher@nmsu.bitnet Mark Leisher Computing Research Lab New Mexico State University Box 3CRL Las Cruces, NM 88001-0001 +1 505 646-5711