2004年08月06日 星期五 17:47
各位大虾: RFC2060中规定了中文信箱名的编码问题,现在摘录如下: 5.1.3. Mailbox International Naming Convention By convention, international mailbox names are specified using a modified version of the UTF-7 encoding described in [UTF-7]. The purpose of these modifications is to correct the following problems with UTF-7: 1) UTF-7 uses the "+" character for shifting; this conflicts with the common use of "+" in mailbox names, in particular USENET newsgroup names. 2) UTF-7’s encoding is BASE64 which uses the "/" character; this conflicts with the use of "/" as a popular hierarchy delimiter. 3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with the use of "\" as a popular hierarchy delimiter. 4) UTF-7 prohibits the unencoded usage of "˜"; this conflicts with the use of "˜" in some servers as a home directory indicator. 5) UTF-7 permits multiple alternate forms to represent the same string; in particular, printable US-ASCII chararacters can be represented in encoded form. In modified UTF-7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character "&" (0x26) is represented by the twooctet sequence "&-". All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further modification from [UTF-7] that "," is used instead of "/". Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself. "&" is used to shift to modified BASE64 and "-" to shift back to USASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet MUST end with a "- "). For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ˜peter/mail/&ZeVnLIqe-;/&U;,BTFw- 本人看了半天还是不知道如何在python中进行中文信箱名的解码和编码,比如, 按照以上规定: “草稿箱”编码以后为:"&g0l6P3ux-;";"发件箱"编码以后为:"&U9FO9nux-;". 各位大虾,如何实现这边的编码和解码?可否示例? Sincerely, Frank Ning -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20040806/77ad20d3/attachment.htm
2004年08月06日 星期五 18:15
跟上面一样编码改成utf-7,然后按照规则进行一些字符的替换 All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further modification from [UTF-7] that "," is used instead of "/". Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself. "&" is used to shift to modified BASE64 and "-" to shift back to USASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet MUST end with a "- "). gavin wrote: > 各位大虾: > > RFC2060中规定了中文信箱名的编码问题,现在摘录如下: > > *5.1.3. Mailbox International Naming Convention* > By convention, international mailbox names are specified using a > modified version of the UTF-7 encoding described in [UTF-7]. The > purpose of these modifications is to correct the following problems > with UTF-7: > > 1) UTF-7 uses the "+" character for shifting; this conflicts with > the common use of "+" in mailbox names, in particular USENET > newsgroup names. > > 2) UTF-7’s encoding is BASE64 which uses the "/" character; this > conflicts with the use of "/" as a popular hierarchy delimiter. > > 3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with > the use of "\" as a popular hierarchy delimiter. > > 4) UTF-7 prohibits the unencoded usage of "˜"; this conflicts with > the use of "˜" in some servers as a home directory indicator. > > 5) UTF-7 permits multiple alternate forms to represent the same > string; in particular, printable US-ASCII chararacters can be > represented in encoded form. > > In modified UTF-7, printable US-ASCII characters except for "&" > represent themselves; that is, characters with octet values 0x20-0x25 > and 0x27-0x7e. The character "&" (0x26) is represented by the twooctet > sequence "&-". > > All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all > Unicode 16-bit octets) are represented in modified BASE64, with a > further modification from [UTF-7] that "," is used instead of "/". > Modified BASE64 MUST NOT be used to represent any printing US-ASCII > character which can represent itself. > "&" is used to shift to modified BASE64 and "-" to shift back to USASCII. > All names start in US-ASCII, and MUST end in US-ASCII (that > is, a name that ends with a Unicode 16-bit octet MUST end with a "- > "). > > For example, here is a mailbox name which mixes English, Japanese, > and Chinese text: ˜peter/mail/&ZeVnLIqe-;/&U;,BTFw- > > > 本人看了半天还是不知道如何在python中进行中文信箱名的解码和编码,比如, > 按照以上规定: > “草稿箱”编码以后为:"&g0l6P3ux-;";"发件箱"编码以后为:"&U9FO9nux-;". > > 各位大虾,如何实现这边的编码和解码?可否示例? > > > > Sincerely, > > Frank Ning > >------------------------------------------------------------------------ > >_______________________________________________ >python-chinese list >python-chinese at lists.python.cn >http://python.cn/mailman/listinfo/python-chinese > >
Zeuux © 2025
京ICP备05028076号