SIMPLIFIED VS TRADITIONAL CHARACTERS
In the 1950’s the government of the PRC embarked on a program to dramatically simplify the written Chinese language, both in terms of the number of characters and the complexity of each character (not all characters were simplified).
Traditional Chinese characters are still in use in Taiwan, Hong Kong, Macau and some of the older US Chinatowns that were populated before the 1950’s. Simplified Chinese characters are in use in mainland China and most Western universities and high schools.
Most digital dictionaries, such as Pleco and mdbg.net (the two best dictionaries), provide an option to view characters in either simplified or traditional forms. The user might infer that there is a one-to-one relationship between the two forms, which is not entirely true.
Our purpose here is to provide some insight into the relationship between Simplified and Traditional Chinese, as well as on our ability to maintain links between the two (at the character level).
To understand what happens when Traditional Chinese is “simplified”, it helps to understand the encoding techniques used to electronically store and transmit the letters (characters) and words of any language.
The term “Unicode” refers to a superset of most character sets in use throughout the world – the goal here is to encode every possible letter or character with a minimum of duplication. For example, most Western languages use the letter “a”, but Unicode doesn’t duplicate every instance of “a” in each language.
The same principle applies to Han (Chinese) characters. The initial set of sources for Han encoding in Unicode was comprised of 121,000 characters. But after the elimination of many duplicates, the final Unicode count was reduced to 20,902.
It is said that Chinese people typically use around 3-4,000 characters for everyday communication, but that number can easily increase to 10,000 for business or scientific uses. Unicode now supports over 70,000 Han characters.
The term “unification” refers to the reduction in the number of (largely) redundant Han characters. The term “Unihan” refers to the unification of all Han characters.
If Han characters had different meanings or etymologies, they were not “unified” into a single Unicode. While only 5% of Han characters are true pictographs, the unification process had to take into account visual differences between similar characters – where there was a significant visual difference between Han characters that represented the same concept, they were assigned unique Unicode values.
This was a fairly sophisticated process carried out over a long period by many East-Asian experts.
|TYPE OF CHANGE||TRAD||SIMP|
|REDUCE # COMPONENTS||崖||厓|
The term “simplification”, on the other hand, refers to the reduction of the overall complexity of each character. Note that not all characters have been simplified. The creation of Simplified Chinese involved 4 possible changes in any character:
- Reducing the number of components.
- Using the same components but in different positions.
- Using simpler components.
- Using entirely different components.
We use the term “consolidation” when one Simplified Chinese character replaces two or more Traditional Chinese characters. For example, 干 is used in the simplified character set in place of the following four characters from the traditional set: 干 幹 乾 and 榦. So, consolidation further reduces the total number of characters in Simplified Chinese.
Because Simplified Chinese is, well, simpler than Traditional Chinese, consolidation resulted in a “many to one” relationship between a non-trivial number of traditional characters and their counterpart simplified characters. This is one of the main reasons that it is increasingly difficult for digital dictionaries to simply toggle back and forth between traditional and simplified characters.
SIMPLIFIED & TRADITIONAL CHARACTERS:
In a word, no.
The Table of General Standard Chinese Characters 通用规范汉字表is the standard list of 8,105 Simplified Chinese characters. The list is continuously updated by the State Council of the People’s Republic of China.
The Taiwanese industry standard for Traditional Chinese (called the Big 5) has approximately 13,000 characters.
Unicode is a super-set of both Simplified and Traditional Chinese with as much duplication as possible removed (consistent with the limitations described above). But that still means there are many concepts that are rendered with enough visual and/or semantic differences that Unicode treats them as separate characters.
- As an example, the respective characters for time, 时 and 時, have two different Unicode values.
- And there are approximately 5,000 traditional characters that don’t exist on a one-for-one basis in Simplified Chinese (see consolidation above).
- Finally, there are also terminology differences, such as the word “subway”, which is expressed as 地铁 (“ground iron”) in Simplified Chinese and 捷運 (“prompt transport”) in Taiwan.
Over the years, translation between Simplified and Traditional Chinese has been complicated by the divergences in language usage between mainland China and Taiwan / Hong Kong.
For these reasons, we think that trying to correlate each simplified character with one traditional character is oversimplified. No pun intended. And as the two written languages continue to evolve, this issue is only going to become more complicated.