Python论坛的帖子： - 哲思

Python论坛 - 讨论区

返回群组主页

标题：[python-chinese] 关于编码问题

分享

胡志刚

楼主 2005年09月23日星期五 15:10

xlp223 myhat123 at gmail.com
Fri Sep 23 15:10:12 HKT 2005

下面是我在ubuntu下调gmail
notifier.py的时候，碰到的一个问题，我使用isinstance来判断也会报错。我把没有使用isinstance之前的出错信息列在下面，看看有什么好办法解决：

checking for new mail (2005/09/23 14:59:52)
1 new messages
梅劲松
[python-chinese] 围 棋
恩，我代表武汉依赛特支持。
Traceback (most recent call last):
  File "./notifier.py", line 435, in ?
    gmailnotifier = GmailNotify()
  File "./notifier.py", line 149, in __init__
    self.mail_check()
  File "./notifier.py", line 216, in mail_check
    self.default_label=""+self.lang.get_string(17)+sender[0:24]+"\n"+shortenstring(subject,20)+"\n\n"+snippet+"..."
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe3 in position
39: unexpected end of data

其中前面列出的信息是sender[0:24],shortenstring(subject,20),snippet的内容。
--
我的blog：http://xlp223.yculblog.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 15:16

limodou limodou at gmail.com
Fri Sep 23 15:16:38 HKT 2005

在 05-9-23，xlp223<myhat123 at gmail.com> 写道：
> 下面是我在ubuntu下调gmail
> notifier.py的时候，碰到的一个问题，我使用isinstance来判断也会报错。我把没有使用isinstance之前的出错信息列在下面，看看有什么好办法解决：
>
> checking for new mail (2005/09/23 14:59:52)
> 1 new messages
> 梅劲松
> [python-chinese] 围 棋
> 恩，我代表武汉依赛特支持。
> Traceback (most recent call last):
>   File "./notifier.py", line 435, in ?
>     gmailnotifier = GmailNotify()
>   File "./notifier.py", line 149, in __init__
>     self.mail_check()
>   File "./notifier.py", line 216, in mail_check
>     self.default_label="> >"+self.lang.get_string(17)+sender[0:24]+"\n"+shortenstring(subject,20)+"\n\n"+snippet+"..."
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe3 in position
> 39: unexpected end of data
>
> 其中前面列出的信息是sender[0:24],shortenstring(subject,20),snippet的内容。

把所有非unicode的字符串加上u试一试。

--
I like python!
My Donews Blog: http://www.donews.net/limodou

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 15:20

xlp223 myhat123 at gmail.com
Fri Sep 23 15:20:42 HKT 2005

下面是我没有动程序，接下来的信息：
现在中文有没有问题了，不知道什么原因？
是不是邮件本身的编码有多样性，我对这个不熟。
checking for new mail (2005/09/23 15:16:06)
2 new messages
曹力
答复: [python-chinese]
应该首先使用单元测试 发件人: python-chinese-bounces at lists.python.cn
generating popup
2 unread messages
----------
checking for new mail (2005/09/23 15:16:48)
2 unread messages
----------
checking for new mail (2005/09/23 15:17:28)
1 new messages
limodou
关于编码问题
在05-9-23，xlp223<myhat123 at gmail.co
generating popup
3 unread messages
----------
checking for new mail (2005/09/23 15:18:08)
2 unread messages


在 05-9-23，limodou<limodou at gmail.com> 写道：
> 在 05-9-23，xlp223<myhat123 at gmail.com> 写道：
> > 下面是我在ubuntu下调gmail
> > notifier.py的时候，碰到的一个问题，我使用isinstance来判断也会报错。我把没有使用isinstance之前的出错信息列在下面，看看有什么好办法解决：
> >
> > checking for new mail (2005/09/23 14:59:52)
> > 1 new messages
> > 梅劲松
> > [python-chinese] 围 棋
> > 恩，我代表武汉依赛特支持。
> > Traceback (most recent call last):
> >   File "./notifier.py", line 435, in ?
> >     gmailnotifier = GmailNotify()
> >   File "./notifier.py", line 149, in __init__
> >     self.mail_check()
> >   File "./notifier.py", line 216, in mail_check
> >     self.default_label="> > >"+self.lang.get_string(17)+sender[0:24]+"\n"+shortenstring(subject,20)+"\n\n"+snippet+"..."
> > UnicodeDecodeError: 'utf8' codec can't decode byte 0xe3 in position
> > 39: unexpected end of data
> >
> > 其中前面列出的信息是sender[0:24],shortenstring(subject,20),snippet的内容。
>
> 把所有非unicode的字符串加上u试一试。
>
> --
> I like python!
> My Donews Blog: http://www.donews.net/limodou
>
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
>
>
>


--
我的blog：http://xlp223.yculblog.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 15:21

Qiangning Hong hongqn at gmail.com
Fri Sep 23 15:21:46 HKT 2005

xlp223 wrote:
[...]
>     self.default_label=""+self.lang.get_string(17)+sender[0:24]+"\n"+shortenstring(subject,20)+"\n\n"+snippet+"..."
> 
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe3 in position
> 39: unexpected end of data
> 
> 其中前面列出的信息是sender[0:24],shortenstring(subject,20),snippet的内容。

是不是sender[0:24]或者shortenstring()把一个utf8字符串给从某个字符中间截
断了？建议先转换成unicode对象再取slice。


-- 
Qiangning Hong
http://www.hn.org/hongqn (RSS: http://feeds.feedburner.com/hongqn)

Registered Linux User #396996
Get Firefox! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=1>
Thunderbird! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=183>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 15:52

xlp223 myhat123 at gmail.com
Fri Sep 23 15:52:32 HKT 2005

我估计是这样，不过，我在ubuntu 上，使用编码转换竟然不起作用
比如:
a='中国'
unicode(a,'gb2312')

>>> a="中国"
>>> unicode(a,"gb2312")
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence
>>>
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
我的系统默认的编码是
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'utf-8')
>>>



在 05-9-23，Qiangning Hong<hongqn at gmail.com> 写道：
> xlp223 wrote:
> [...]
> >     self.default_label=""+self.lang.get_string(17)+sender[0:24]+"\n"+shortenstring(subject,20)+"\n\n"+snippet+"..."
> >
> > UnicodeDecodeError: 'utf8' codec can't decode byte 0xe3 in position
> > 39: unexpected end of data
> >
> > 其中前面列出的信息是sender[0:24],shortenstring(subject,20),snippet的内容。
>
> 是不是sender[0:24]或者shortenstring()把一个utf8字符串给从某个字符中间截
> 断了？建议先转换成unicode对象再取slice。
>
>
> --
> Qiangning Hong
> http://www.hn.org/hongqn (RSS: http://feeds.feedburner.com/hongqn)
>
> Registered Linux User #396996
> Get Firefox! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=1>
> Thunderbird! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=183>
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
>


--
我的blog：http://xlp223.yculblog.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:08

limodou limodou at gmail.com
Fri Sep 23 16:08:33 HKT 2005

在 05-9-23，xlp223<myhat123 at gmail.com> 写道：
> 我估计是这样，不过，我在ubuntu 上，使用编码转换竟然不起作用
> 比如:
> a='中国'
> unicode(a,'gb2312')
>
> >>> a="中国"
> >>> unicode(a,"gb2312")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
> illegal multibyte sequence
> >>>
> －－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
> 我的系统默认的编码是
> >>> import locale
> >>> locale.getdefaultlocale()
> ('en_US', 'utf-8')
> >>>
>

这样是不行。我认为原因是：

直接写一个中文字符时，它的字符编码使用系统缺省编码。比如你的系统是utf-8编码的，因此你直接写：

a = '中国'

它其实是utf-8编码的。可以使用：

print repr(a)

看一下。如果是6个字节那就是utf-8编码。因此就不能使用gb2312来处理了，而要使用utf-8来处理。
--
I like python!
My Donews Blog: http://www.donews.net/limodou

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:13

Qiangning Hong hongqn at gmail.com
Fri Sep 23 16:13:57 HKT 2005

xlp223 wrote:
> 我估计是这样，不过，我在ubuntu 上，使用编码转换竟然不起作用
> 比如:
> a='中国'
> unicode(a,'gb2312')
> 
> 
>>>>a="中国"
>>>>unicode(a,"gb2312")
> 
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
> illegal multibyte sequence
> 
> －－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
> 我的系统默认的编码是
> 
>>>>import locale
>>>>locale.getdefaultlocale()
> 
> ('en_US', 'utf-8')

你的默认编码是utf8，怎么能用gb2312转换编码呢？应该是：
a = '中国'  # a是utf8编码字符串
unicode(a, 'utf8')


-- 
Qiangning Hong
http://www.hn.org/hongqn (RSS: http://feeds.feedburner.com/hongqn)

Registered Linux User #396996
Get Firefox! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=1>
Thunderbird! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=183>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:17

Qiangning Hong hongqn at gmail.com
Fri Sep 23 16:17:04 HKT 2005

limodou wrote:
> 这样是不行。我认为原因是：
> 
> 直接写一个中文字符时，它的字符编码使用系统缺省编码。比如你的系统是utf-8编码的，因此你直接写：

应该是console的字符编码，这个在一般情况下和系统缺省编码一致。

-- 
Qiangning Hong
http://www.hn.org/hongqn (RSS: http://feeds.feedburner.com/hongqn)

Registered Linux User #396996
Get Firefox! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=1>
Thunderbird! <http://www.spreadfirefox.com/?q=affiliates&id;=67907&t;=183>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:19

limodou limodou at gmail.com
Fri Sep 23 16:19:34 HKT 2005

在 05-9-23，Qiangning Hong<hongqn at gmail.com> 写道：
> limodou wrote:
> > 这样是不行。我认为原因是：
> >
> > 直接写一个中文字符时，它的字符编码使用系统缺省编码。比如你的系统是utf-8编码的，因此你直接写：
>
> 应该是console的字符编码，这个在一般情况下和系统缺省编码一致。
>

是的。

--
I like python!
My Donews Blog: http://www.donews.net/limodou

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:48

Protoss Jiang jsonic1106 at gmail.com
Fri Sep 23 16:48:51 HKT 2005

import string

mystr = 'helloworld'
mystr.encode('utf8')

在 05-9-23，limodou<limodou at gmail.com> 写道：
> 在 05-9-23，Qiangning Hong<hongqn at gmail.com> 写道：
> > limodou wrote:
> > > 这样是不行。我认为原因是：
> > >
> > > 直接写一个中文字符时，它的字符编码使用系统缺省编码。比如你的系统是utf-8编码的，因此你直接写：
> >
> > 应该是console的字符编码，这个在一般情况下和系统缺省编码一致。
> >
>
> 是的。
>
> --
> I like python!
> My Donews Blog: http://www.donews.net/limodou
>
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
>
>
>


--
Protoss Jiang
mailto: jsonic1106 at gmail.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 16:54

limodou limodou at gmail.com
Fri Sep 23 16:54:28 HKT 2005

在 05-9-23，Protoss Jiang<jsonic1106 at gmail.com> 写道：
> import string
>
> mystr = 'helloworld'
> mystr.encode('utf8')
>

导入string模块是多余的。
而且这样的代码是有问题的。

一般来说unicode对象使用encode可以转为其它的编码。而一般的字符转为unicode可以使用字符串对象的decode方法。但个人认为更直观的做法是使用unicode方法将一个字符转为unicode。因此上面的代码是将字符串使用encode，得到的依然是一个string对象，而不是unicode。如果的确想这样做，那是没有什么意思的。应该：

unicode(mystr, 'utf-8')

--
I like python!
My Donews Blog: http://www.donews.net/limodou

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 17:15

xlp223 myhat123 at gmail.com
Fri Sep 23 17:15:54 HKT 2005

好的，谢谢各位，我一直对编码不甚了解。这下了解不少。我再调试看看。

在 05-9-23，limodou<limodou at gmail.com> 写道：
> 在 05-9-23，Protoss Jiang<jsonic1106 at gmail.com> 写道：
> > import string
> >
> > mystr = 'helloworld'
> > mystr.encode('utf8')
> >
>
> 导入string模块是多余的。
> 而且这样的代码是有问题的。
>
> 一般来说unicode对象使用encode可以转为其它的编码。而一般的字符转为unicode可以使用字符串对象的decode方法。但个人认为更直观的做法是使用unicode方法将一个字符转为unicode。因此上面的代码是将字符串使用encode，得到的依然是一个string对象，而不是unicode。如果的确想这样做，那是没有什么意思的。应该：
>
> unicode(mystr, 'utf-8')
>
> --
> I like python!
> My Donews Blog: http://www.donews.net/limodou
>
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
>
>
>


--
我的blog：http://xlp223.yculblog.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年09月23日星期五 18:01

xlp223 myhat123 at gmail.com
Fri Sep 23 18:01:27 HKT 2005

我找到了问题所在，确实是编码截断的问题，在截断之前用unicode(s,'utf-8')转换一下，就行了。但是马上出现了gtk的warning问题，但不影响使用。

在 05-9-23，xlp223<myhat123 at gmail.com> 写道：
> 好的，谢谢各位，我一直对编码不甚了解。这下了解不少。我再调试看看。
>
> 在 05-9-23，limodou<limodou at gmail.com> 写道：
> > 在 05-9-23，Protoss Jiang<jsonic1106 at gmail.com> 写道：
> > > import string
> > >
> > > mystr = 'helloworld'
> > > mystr.encode('utf8')
> > >
> >
> > 导入string模块是多余的。
> > 而且这样的代码是有问题的。
> >
> > 一般来说unicode对象使用encode可以转为其它的编码。而一般的字符转为unicode可以使用字符串对象的decode方法。但个人认为更直观的做法是使用unicode方法将一个字符转为unicode。因此上面的代码是将字符串使用encode，得到的依然是一个string对象，而不是unicode。如果的确想这样做，那是没有什么意思的。应该：
> >
> > unicode(mystr, 'utf-8')
> >
> > --
> > I like python!
> > My Donews Blog: http://www.donews.net/limodou
> >
> > _______________________________________________
> > python-chinese list
> > python-chinese at lists.python.cn
> > http://python.cn/mailman/listinfo/python-chinese
> >
> >
> >
>
>
> --
> 我的blog：http://xlp223.yculblog.com
>


--
我的blog：http://xlp223.yculblog.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！

Zeuux © 2025

京ICP备05028076号