Python论坛  - 讨论区

标题:[python-chinese] 再求教:中文信箱名的解码和编码问题

2004年08月06日 星期五 17:47

gavin gavin at sz.net.cn
Fri Aug 6 17:47:01 HKT 2004

各位大虾:

RFC2060中规定了中文信箱名的编码问题,现在摘录如下:

5.1.3. Mailbox International Naming Convention
By convention, international mailbox names are specified using a
modified version of the UTF-7 encoding described in [UTF-7]. The
purpose of these modifications is to correct the following problems
with UTF-7:

1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.

2) UTF-7’s encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.

3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
the use of "\" as a popular hierarchy delimiter.

4) UTF-7 prohibits the unencoded usage of "˜"; this conflicts with
the use of "˜" in some servers as a home directory indicator.

5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.

In modified UTF-7, printable US-ASCII characters except for "&"
represent themselves; that is, characters with octet values 0x20-0x25
and 0x27-0x7e. The character "&" (0x26) is represented by the twooctet
sequence "&-".

All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to USASCII.
All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").

For example, here is a mailbox name which mixes English, Japanese,
and Chinese text: ˜peter/mail/&ZeVnLIqe-;/&U;,BTFw-


本人看了半天还是不知道如何在python中进行中文信箱名的解码和编码,比如,
按照以上规定:
“草稿箱”编码以后为:"&g0l6P3ux-;";"发件箱"编码以后为:"&U9FO9nux-;".

各位大虾,如何实现这边的编码和解码?可否示例?



Sincerely,

Frank Ning 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20040806/77ad20d3/attachment.htm

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年08月06日 星期五 18:15

gentoo.cn gentoo.cn at 126.com
Fri Aug 6 18:15:29 HKT 2004

跟上面一样编码改成utf-7,然后按照规则进行一些字符的替换


All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to USASCII.
All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").



gavin wrote:

> 各位大虾:
>  
> RFC2060中规定了中文信箱名的编码问题,现在摘录如下:
>  
> *5.1.3. Mailbox International Naming Convention*
> By convention, international mailbox names are specified using a
> modified version of the UTF-7 encoding described in [UTF-7]. The
> purpose of these modifications is to correct the following problems
> with UTF-7:
>
> 1) UTF-7 uses the "+" character for shifting; this conflicts with
> the common use of "+" in mailbox names, in particular USENET
> newsgroup names.
>
> 2) UTF-7’s encoding is BASE64 which uses the "/" character; this
> conflicts with the use of "/" as a popular hierarchy delimiter.
>
> 3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
> the use of "\" as a popular hierarchy delimiter.
>
> 4) UTF-7 prohibits the unencoded usage of "˜"; this conflicts with
> the use of "˜" in some servers as a home directory indicator.
>
> 5) UTF-7 permits multiple alternate forms to represent the same
> string; in particular, printable US-ASCII chararacters can be
> represented in encoded form.
>
> In modified UTF-7, printable US-ASCII characters except for "&"
> represent themselves; that is, characters with octet values 0x20-0x25
> and 0x27-0x7e. The character "&" (0x26) is represented by the twooctet
> sequence "&-".
>
> All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
> Unicode 16-bit octets) are represented in modified BASE64, with a
> further modification from [UTF-7] that "," is used instead of "/".
> Modified BASE64 MUST NOT be used to represent any printing US-ASCII
> character which can represent itself.
> "&" is used to shift to modified BASE64 and "-" to shift back to USASCII.
> All names start in US-ASCII, and MUST end in US-ASCII (that
> is, a name that ends with a Unicode 16-bit octet MUST end with a "-
> ").
>
> For example, here is a mailbox name which mixes English, Japanese,
> and Chinese text: ˜peter/mail/&ZeVnLIqe-;/&U;,BTFw-
>  
>  
> 本人看了半天还是不知道如何在python中进行中文信箱名的解码和编码,比如,
> 按照以上规定:
> “草稿箱”编码以后为:"&g0l6P3ux-;";"发件箱"编码以后为:"&U9FO9nux-;".
>  
> 各位大虾,如何实现这边的编码和解码?可否示例?
>  
>  
>
> Sincerely,
>  
> Frank Ning
>
>------------------------------------------------------------------------
>
>_______________________________________________
>python-chinese list
>python-chinese at lists.python.cn
>http://python.cn/mailman/listinfo/python-chinese
>  
>



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号