GB18030 - The New Chinese Encoding Standard

Maintained by Hongbo Ni

Introduction

In 2000, the Government of the People's Republic of China introduced a new standard for the support of its national characters. The standard, known as GB18030, replaces the existing Chinese standard GB2312/GBK. It's composary that all new products released in China after January 1, 2001 must support GB18030, which includes the existing GBK character set plus 6582 Unicode Extension-A, and 1,948 additional non-Han characters (Mongolian, Uygur, Tibetan, and Yi).

The internal processing code for the character repertoire can and should be Unicode; however, the standard stipulates that software providers must guarantee a successful round-trip between GB18030 and the internal processing code.

All products currently sold or to be sold in China must plan the code page migration to support GB18030 without exception. GB18030 is a "mandatory standard" and the Chinese government regulates the certification process to reinforce GB18030 deployment.

Basically, GB18030 is based on Chinese National Standard GB2312+GBK, so it's compatible with existing GB/GBK. 4 byte encoding is added to encode all the unicode code points that are not yet included in GB/GBK. The 4 byte encoding is defined in form of B1 B2 B3 B4, where B1 and B3 are in range of 0x81-0xFF, B2 and B4 are in range of 0x30-0x39.

References

PRC National Standard GB18030 - Chinese Ideograms Coded Characters Set for Information Interchange - Extension for the Basic Set (814k PDF)

GB18030 Support in Windows 2000/XP by Microsoft
Exploring the history and structure of the new Chinese Unicode standard by Markus Scherer
Understanding the GB 18030 Standard by Dirk Meyer
How to create GB18030 to Unicode Mapping Table by Markus Scherer
GB18030 to Unicode Mapping (included in GNU LIBICONV Package)
More GB18030 Info on Google ...

Products that support GB18030

NJStar Communicator v2.30
Microsoft Windows XP Chinese Version Sold in China

more to come ....