Python论坛的帖子：

Fri Aug 6 15:32:44 HKT 2004

各位大虾：

RFC2060中规定了中文信箱名的编码问题，现在摘录如下：

5.1.3. Mailbox International Naming Convention
By convention, international mailbox names are specified using a
modified version of the UTF-7 encoding described in [UTF-7]. The
purpose of these modifications is to correct the following problems
with UTF-7:

1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.

2) UTF-7’s encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.

3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
the use of "\" as a popular hierarchy delimiter.

4) UTF-7 prohibits the unencoded usage of "˜"; this conflicts with
the use of "˜" in some servers as a home directory indicator.

5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.

In modified UTF-7, printable US-ASCII characters except for "&"
represent themselves; that is, characters with octet values 0x20-0x25
and 0x27-0x7e. The character "&" (0x26) is represented by the twooctet
sequence "&-".

All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to USASCII.
All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").

For example, here is a mailbox name which mixes English, Japanese,
and Chinese text: ˜peter/mail/&ZeVnLIqe-;/&U;,BTFw-

本人看了半天还是不知道如何在python中进行中文信箱名的解码和编码，比如，
按照以上规定：
“草稿箱”编码以后为："&g0l6P3ux-;";"发件箱"编码以后为:"&U9FO9nux-;".

各位大虾，如何实现这边的编码和解码？可否示例？

最后一个问题，Python是不错，可惜中文处理实在头疼！

按有的资料介绍，UTF-8的解码和编码可以用如下方法：
s=u"社会主义中国"
u8=s.encode("utf-8")  ---转化成utf-8
#转化以后是“脡莽禄谩脰梅脪氓脰脨鹿煤”，而别的应用从gb2312转换后是"绀句細涓讳箟涓浗"
u8.decode("utf-8")    ---转化成unicode

如果读取别的系统转换后的“绀句細涓讳箟涓浗”（utf-8），采用上述方法解码是就会出错：）

Sincerely,

Frank Ning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20040806/6357e03c/attachment.html

标题：[python-chinese] 如何解码中文信箱名的编码问题？