INTRODUCTION
------------
Kterm is a modified version of xterm that is capable of displaying
text from character sets requiring 2-bytes per character as well as
the standard single byte character sets.  The original kterm was
designed to support display of Japanese text.  This capability has
been expanded to include Chinese and Korean as well.

CHARACTER SETS AND CODINGS
--------------------------
Version 4.1.2 of kterm can display Chinese, Japanese, and Korean text
in a number of coding systems.  With the exception of the Korean
N-byte coding, all of the coding systems described below require two
bytes per character.

   1. Chinese

      A. GB2312-1980 (GuoBiao) PRC standard
         GB is a seven bit standard that requires two bytes per
         character.  It is most often used with the high (most
         significant) bit set on each byte of the character to
         distinguish the Chinese text from other seven bit text.  The
         eight bit usage of GB is also used in CCDOS, the Chinese
         version of MS-DOS.
         NOTE: Perhaps the eight bit usage should be refered to as
               EUC (Extended Unix Code).
         CODE RANGE: 0xA1A1-0xFEFE

      B. Shift-GB
         Shift-GB is a mixed seven and eight bit coding, with the
         first byte always having the high (most significant) bit set
         to distinguish it from other seven bit text.  Shift-GB was
         used by the Chinese Macintosh OS until recently.
         NOTE: I'm not sure if it is an official standard.
         CODE RANGE: 0x8140-0xAFFC (excluding 0x7F as a second byte)

      C. Big5
         Big5 is a mixed seven and eight bit coding, with the first
         byte always having the high (most significant) bit set to
         distinguish it from the other seven bit text.  Big5 is at
         least a de facto standard in places like Hong Kong and Taiwan
         where the Traditional Chinese ideographs are used.
         NOTE: Rumor has it that it is, or will be a standard in
               Taiwan.  I don't have any facts on this yet.
         CODE RANGE: 0xA140-0xF9FE

   2. Japanese
      A. JIS (Japanese Industrial Standard X0208-1983)
         JIS is a seven bit standard that is usually distinguished
         from other seven bit text by a starting and ending escape
         sequence.
         START ESCAPE SEQUENCE: <ESC>$B (NEW-JIS) <ESC>@B (OLD-JIS)
         END ESCAPE SEQUENCE  : <ESC>(B
         CODE RANGE: 0x2121-0x7E7E
      B. Shift-JIS
         Shift-JIS is a mixed seven and eight bit coding, with the
         high (most significant) bit of the first byte set to
         distinguish it from the other seven bit text.
         CODE RANGE:
           FIRST BYTE : 0x81-0x9F and 0xE0-0xEF
           SECOND BYTE: 0x40-0xFC (excluding 0x7F)
      C. EUC
         EUC is an eight bit usage of JIS, with the high (most
         significant) bit of each byte set to distinguish it from
         other seven bit text.
         CODE RANGE: 0xA1A1-0xFEFE

   3. Korean

      A. KSC5601-1987 (Jamos and Hangul)
         This version of kterm only supports the Jamos (Hangul
         elements) and Hangul portion of the KSC5601-1987 standard.
         The Hanja portion will come later.
         KS is a seven bit standard that requires two bytes per
         Hangul character.  It is most often used with the high (most
         significant) bit set on each byte of the character to
         distinguish the Korean text from other seven bit text.
         NOTE: Perhaps the eight bit usage should be refered to as
               EUC (Extended Unix Code).
         CODE RANGE:
           JAMOS : 0xA4A1-0xA4FE
           HANGUL: 0xB0A1-0xC8FE

      B. N-byte
         N-byte code is a way of representing Hangul text using only
         ASCII characters.  It uses a variable number of bytes to
         select a particular Hangul syllable and is distinguished from
         other seven bit text by the SO (Shift Out) sequence and the SI
         (Shift In) sequence.
         START ESCAPE SEQUENCE: ^N (0x0E)
         END ESCAPE SEQUENCE  : ^O (0x0F)
         CODE RANGE: 0x41-0x7C (full range)
         NOTE: The code range actually varies.  See the file
               "hgutil.c" for details.
         
   4. X11 Compound Text
      Version 4.1.2 of kterm now recognizes most of the Compound Text
      approved standard encodings.  It does not recognize the
      non-standard character set encodings or the directionality
      indicators.  Even though the approved standard encodings are
      recognized, this is no guarantee that they will display text
      appropriately, specifically the right-to-left encodings.  Code
      will have to be added to support this.

      The 94^N Compound Text sequences for GB 2312-1980, JIS
      X0208-1983, and KS C5601-1987 will be interpreted correctly if
      the appropriate language is chosen when starting kterm, or if it
      is set in the application defaults file, KTerm.ad.

FONTS
-----
There are a number of freely available Chinese, Japanese and Korean
X11 fonts available.  Here are some anonymous ftp sites where the
fonts are available:

1. HOST: crl.nmsu.edu [128.123.1.14]
   CRL has a relatively complete collection of the freely available
   Chinese, Japanese, and Korean X11 fonts.  They are located in the
   subdirectories pub/chinese/fonts, pub/japanese/fonts, and
   pub/korean/.  The CRL site also has lists of known anonymous ftp
   sites for software related to the language of interest.

2. HOST: miki.cs.titech.ac.jp [131.112.16.39]
   HOST: utsun.s.u-tokyo.ac.jp [133.11.11.11]
   These ftp sites have large collections of many Usenet and JUNET
   newsgroup archives.  The fj.sources archives contain many of the
   Japanese X11 fonts that have been posted on JUNET.  There are Index
   files in most of the directories describing which archive file has
   the font sources.

3. HOST: kum.kaist.ac.kr [137.68.1.65]
   There are a few Korean utilities available from this site as well
   as archives of a number of Usenet news groups.  Most of the Korean
   related code and fonts are located in pub/hangul/.

AUTHORS AND CONTRIBUTORS
------------------------
The initial conversion work on xterm for displaying Japanese text was
done by kagotani@cs.titech.ac.jp (Hiroto Kagotani).

The ANSI color support was added using the kterm 4.1.0 patches
provided by mukawa@tn-sec.ntt.junet (Susumu Mukawa).

The Multi-Byte Character Set Word Select feature was added using a
modified version of Kiyoshi KANAZAWA's 4.1.0 MBCS_WSEL patches.

The Chinese and Korean support was added by
mleisher@nmsu.edu (Mark Leisher).

CLOSING NOTES
-------------
The {character set,font set,language,conversion} mechanisms are a little
clumsy and should eventually be modified to be more in line with XPG3
locale specifications and the up-coming X11 i18n specifications.
Hopefully, this won't be too far away.

BUG REPORTS
-----------
Please send bug reports and/or fixes for kterm 4.1.2 to
mleisher@nmsu.edu or mleisher@nmsu.bitnet.

THANKS
------
I would like to express my thanks to Mr. Kagotani for doing the
initial conversion work.  His code made it a lot easier for me to add
support for Chinese and Korean.

Thanks go to Ricky Yeung and F. F. Lee for making their Chinese code
conversion programs freely available.

I would also like to thank ujsung@solgai.kaist.ac.kr (UnJae Sung) for
having the patience to answer my questions about Korean coding.

And last but not least, thanks go to these people for significant bug
reports and fixes:

  John Melby of Fujitsu

  Martin C. Fong of Sybase

  Yang Zhiwei of the German National Research Center for Computer
  Science

  Alton Harkcom (for help updating the Japanese manual page)


				Sat May  4 14:11:37 1991

				Internet: mleisher@nmsu.edu
				Bitnet  : mleisher@nmsu.bitnet

				Mark Leisher
				Computing Research Lab
				New Mexico State University
                                Box 3CRL
				Las Cruces, NM 88001-0001
				+1 505 646-5711