2006年07月19日 星期三 21:27
下面是一次操作过程。 $ ipython Python 2.3.5 (#2, Jun 13 2006, 23:12:55) Type "copyright", "credits" or "license" for more information. IPython 0.7.2 -- An enhanced Interactive Python. In [1]: import sys In [2]: sys.getdefaultencoding() Out[2]: 'utf-8' In [3]: unicode('钗', 'utf-8') --------------------------------------------------------------------------- exceptions.UnicodeDecodeError Traceback (most recent call last) /home/rlf/prog/test/python/scripts/UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte > (1) ipdb> quit In [4]: unicode('头', 'utf-8') Out[4]: u'\u5934' In [5]: unicode('凤', 'utf-8') Out[5]: u'\u51e4' In [6]: 我的遇到的实际问题是 $ ls -sh 钗头凤.mp3 3.4M 钗头凤.mp3 $ openfile.py 钗头凤.mp3 IOError: [Errno 2] No such file or directory: ' \x92\x97\xe5\xa4\xb4\xe5\x87\xa4.mp3' $ cat openfile.py #! /usr/bin/python # -*- coding: utf-8; -*- fl = open(sys.argv[1]) $
2006年07月20日 星期四 08:16
Python 2.4.3 >>> '钗' '\xee\xce' >>> '钗'.decode( 'gbk' ) u'\u9497' >>> '钗'.decode( 'gbk' ).encode( 'utf-8' ) '\xe9\x92\x97' >>> '钗'.decode( 'gbk' ).encode( 'utf-8' ).decode( 'utf-8' ) u'\u9497' 估计你装的cjkcodec包有bug,是最新版吗? 在 2006-7-19 21:27:36,Ren Lifeng <lfren at cad.zju.edu.cn> 写道: > 下面是一次操作过程。 > > $ ipython > Python 2.3.5 (#2, Jun 13 2006, 23:12:55) > Type "copyright", "credits" or "license" for more information. > > IPython 0.7.2 -- An enhanced Interactive Python. > > In [1]: import sys > In [2]: sys.getdefaultencoding() > Out[2]: 'utf-8' > In [3]: unicode('钗', 'utf-8') > --------------------------------------------------------------------------- > exceptions.UnicodeDecodeError Traceback (most recent call last) > > /home/rlf/prog/test/python/scripts/> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte > > (1) > > ipdb> quit > In [4]: unicode('头', 'utf-8') > Out[4]: u'\u5934' > In [5]: unicode('凤', 'utf-8') > Out[5]: u'\u51e4' > In [6]: > > > 我的遇到的实际问题是 > $ ls -sh 钗头凤.mp3 > 3.4M 钗头凤.mp3 > $ openfile.py 钗头凤.mp3 > IOError: [Errno 2] No such file or directory: ' \x92\x97\xe5\xa4\xb4\xe5\x87\xa4.mp3' > $ cat openfile.py > #! /usr/bin/python > # -*- coding: utf-8; -*- > fl = open(sys.argv[1]) > $ > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese -- 张骏 <zhangj at foreseen-info.com> 敏捷来自Python 简单源于我们 丰元信信息技术有限公司 Python技术交流群:22507237
2006年07月20日 星期四 10:12
On 7/19/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: > 下面是一次操作过程。 > > $ ipython > Python 2.3.5 (#2, Jun 13 2006, 23:12:55) > Type "copyright", "credits" or "license" for more information. > > IPython 0.7.2 -- An enhanced Interactive Python. > > In [1]: import sys > In [2]: sys.getdefaultencoding() > Out[2]: 'utf-8' > In [3]: unicode('钗', 'utf-8') 这个'钗'是utf-8编码的吗?查看一下你的sys.stdin.encoding是什么编码。它决定了你在命令行输入时用到的编码。 > --------------------------------------------------------------------------- > exceptions.UnicodeDecodeError Traceback (most recent call last) > > /home/rlf/prog/test/python/scripts/> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte > > (1) > > ipdb> quit > In [4]: unicode('头', 'utf-8') > Out[4]: u'\u5934' > In [5]: unicode('凤', 'utf-8') > Out[5]: u'\u51e4' > In [6]: > > > 我的遇到的实际问题是 > $ ls -sh 钗头凤.mp3 > 3.4M 钗头凤.mp3 > $ openfile.py 钗头凤.mp3 > IOError: [Errno 2] No such file or directory: ' \x92\x97\xe5\xa4\xb4\xe5\x87\xa4.mp3' 这前面怎么好象有空格? > $ cat openfile.py > #! /usr/bin/python > # -*- coding: utf-8; -*- > fl = open(sys.argv[1]) > $ -- I like python! My Blog: http://www.donews.net/limodou My Django Site: http://www.djangocn.org NewEdit Maillist: http://groups.google.com/group/NewEdit
2006年07月20日 星期四 13:21
cjkcodecs: python-cjkcodecs 1.1.1-2 不想装 2.4 张骏 <zhangj at foreseen-info.com> writes: > Python 2.4.3 > >>>> '钗' > '\xee\xce' >>>> '钗'.decode( 'gbk' ) > u'\u9497' >>>> '钗'.decode( 'gbk' ).encode( 'utf-8' ) > '\xe9\x92\x97' >>>> '钗'.decode( 'gbk' ).encode( 'utf-8' ).decode( 'utf-8' ) > u'\u9497' > > 估计你装的cjkcodec包有bug,是最新版吗? > > 在 2006-7-19 21:27:36,Ren Lifeng <lfren at cad.zju.edu.cn> 写道: >> 下面是一次操作过程。 >> >> $ ipython >> Python 2.3.5 (#2, Jun 13 2006, 23:12:55) >> Type "copyright", "credits" or "license" for more information. >> >> IPython 0.7.2 -- An enhanced Interactive Python. >> >> In [1]: import sys >> In [2]: sys.getdefaultencoding() >> Out[2]: 'utf-8' >> In [3]: unicode('钗', 'utf-8') >> --------------------------------------------------------------------------- >> exceptions.UnicodeDecodeError Traceback (most recent call last) >> >> /home/rlf/prog/test/python/scripts/>> >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte >> > (1) >> >> ipdb> quit >> In [4]: unicode('头', 'utf-8') >> Out[4]: u'\u5934' >> In [5]: unicode('凤', 'utf-8') >> Out[5]: u'\u51e4' >> In [6]: >> >> >> 我的遇到的实际问题是 >> $ ls -sh 钗头凤.mp3 >> 3.4M 钗头凤.mp3 >> $ openfile.py 钗头凤.mp3 >> IOError: [Errno 2] No such file or directory: ' \x92\x97\xe5\xa4\xb4\xe5\x87\xa4.mp3' >> $ cat openfile.py >> #! /usr/bin/python >> # -*- coding: utf-8; -*- >> fl = open(sys.argv[1]) >> $ >> _______________________________________________ >> python-chinese >> Post: send python-chinese at lists.python.cn >> Subscribe: send subscribe to python-chinese-request at lists.python.cn >> Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn >> Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > -- > 张骏 <zhangj at foreseen-info.com> > > 敏捷来自Python > 简单源于我们 > 丰元信信息技术有限公司 > > Python技术交流群:22507237 > > > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > --
2006年07月20日 星期四 13:27
limodou <limodou at gmail.com> writes: In [1]: import sys In [2]: sys.stdin.encoding Out[2]: 'UTF-8' In [3]: In [3]: '钗'[0] Out[3]: ' ' In [4]: ord('钗'[0]) Out[4]: 32 In [5]: len('钗') Out[5]: 6 In [6]: 这么常用的字竟然要用6字节来编码。而且第一个字节竟然是 0x20。 另外,据我猜测 sys.stdin.encoding 应该和 sys.getdefaultencoding() 一致。 > On 7/19/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: >> 下面是一次操作过程。 >> >> $ ipython >> Python 2.3.5 (#2, Jun 13 2006, 23:12:55) >> Type "copyright", "credits" or "license" for more information. >> >> IPython 0.7.2 -- An enhanced Interactive Python. >> >> In [1]: import sys >> In [2]: sys.getdefaultencoding() >> Out[2]: 'utf-8' >> In [3]: unicode('钗', 'utf-8') > > 这个'钗'是utf-8编码的吗?查看一下你的sys.stdin.encoding是什么编码。它决定了你在命令行输入时用到的编码。 > >> --------------------------------------------------------------------------- >> exceptions.UnicodeDecodeError Traceback (most recent call last) >> >> /home/rlf/prog/test/python/scripts/>> >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte >> > (1) >> >> ipdb> quit >> In [4]: unicode('头', 'utf-8') >> Out[4]: u'\u5934' >> In [5]: unicode('凤', 'utf-8') >> Out[5]: u'\u51e4' >> In [6]: >> >> >> 我的遇到的实际问题是 >> $ ls -sh 钗头凤.mp3 >> 3.4M 钗头凤.mp3 >> $ openfile.py 钗头凤.mp3 >> IOError: [Errno 2] No such file or directory: ' \x92\x97\xe5\xa4\xb4\xe5\x87\xa4.mp3' > > 这前面怎么好象有空格? > >> $ cat openfile.py >> #! /usr/bin/python >> # -*- coding: utf-8; -*- >> fl = open(sys.argv[1]) >> $ > > -- > I like python! > My Blog: http://www.donews.net/limodou > My Django Site: http://www.djangocn.org > NewEdit Maillist: http://groups.google.com/group/NewEdit > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese --
2006年07月20日 星期四 13:36
On 7/20/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: > limodou <limodou at gmail.com> writes: > > In [1]: import sys > In [2]: sys.stdin.encoding > Out[2]: 'UTF-8' > In [3]: > > In [3]: '钗'[0] > Out[3]: ' ' > In [4]: ord('钗'[0]) > Out[4]: 32 > In [5]: len('钗') > Out[5]: 6 > In [6]: > > 这么常用的字竟然要用6字节来编码。而且第一个字节竟然是 0x20。 > 不 知道你的系统是怎么回事。 -- I like python! My Blog: http://www.donews.net/limodou My Django Site: http://www.djangocn.org NewEdit Maillist: http://groups.google.com/group/NewEdit
2006年07月20日 星期四 13:37
钗的UTF8编码应该是0xE9 0x92 0x97 -- 茫茫人海,你是我的最爱 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060720/b7d93c34/attachment.html
2006年07月20日 星期四 13:44
麻烦你告诉我,在你那里 u'钗' 是什么 In[1]: u'钗'
2006年07月20日 星期四 14:38
我这里目前也和你的问题一样, 我这里得到的u'钗'是 \u9497 On 7/20/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: > > > 麻烦你告诉我,在你那里 u'钗' 是什么 > > In[1]: u'钗' > > > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060720/2bb2c825/attachment.htm
2006年07月22日 星期六 10:48
On 7/20/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: > > In [1]: import sys > In [2]: sys.getdefaultencoding() > Out[2]: 'utf-8' > In [3]: unicode('钗', 'utf-8') > --------------------------------------------------------------------------- > exceptions.UnicodeDecodeError Traceback (most recent call last) > > /home/rlf/prog/test/python/scripts/> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte > > (1) > 试了一下,应该是 ipython 的 bug。直接用 python 命令行就好了。 -- Best Regards Carlos
2006年07月22日 星期六 14:41
"Carlos Liu" <about.linux at gmail.com> writes: > On 7/20/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: >> >> In [1]: import sys >> In [2]: sys.getdefaultencoding() >> Out[2]: 'utf-8' >> In [3]: unicode('钗', 'utf-8') >> --------------------------------------------------------------------------- >> exceptions.UnicodeDecodeError Traceback (most recent call last) >> >> /home/rlf/prog/test/python/scripts/>> >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: unexpected code byte >> > (1) >> > > 试了一下,应该是 ipython 的 bug。直接用 python 命令行就好了。 > 是 python shell 的问题。下面是我在 rxvt/bash 下面的我在交互模式下运行 python 的一次过程。 rlf at gforge:~$ python Python 2.3.5 (#2, Jun 13 2006, 23:12:55) [GCC 4.1.2 20060613 (prerelease) (Debian 4.1.1-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> ss = ' ' >>> hh = ['0x%x' % ord(s) for s in ss] >>> hh ['0x20', '0x20', '0x20', '0x20', '0x92', '0x97'] >>> 上面显示的那个象空格的东西就是我输入的“钗字”。python shell 会在钗字本身的 编码前面加入4个空格,并把 0xe9 吃掉。 我用的 debian/testing 带的python 2.3.5。 现在我是这样避开这个问题的。 In [3]: ed IPython will make a temporary file named: /tmp/ipython_edit_a5wBIK.py Editing...Waiting for Emacs... done. Executing edited code... Out[3]: "# -*- coding: utf-8; -*-\nss = '\xe9\x92\x97'\n" In [4]: !cat /tmp/ipython_edit_a5wBIK.py # -*- coding: utf-8; -*- ss = '钗' 即编辑并运行一个临时文件,在这个文件中对字符串赋值。 --
2006年07月22日 星期六 18:44
On 7/22/06, Ren Lifeng <lfren at cad.zju.edu.cn> wrote: > 是 python shell 的问题。下面是我在 rxvt/bash 下面的我在交互模式下运行 > python 的一次过程。 > > rlf at gforge:~$ python > Python 2.3.5 (#2, Jun 13 2006, 23:12:55) > [GCC 4.1.2 20060613 (prerelease) (Debian 4.1.1-4)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> ss = ' ' > >>> hh = ['0x%x' % ord(s) for s in ss] > >>> hh > ['0x20', '0x20', '0x20', '0x20', '0x92', '0x97'] > >>> > > 上面显示的那个象空格的东西就是我输入的"钗字"。python shell 会在钗字本身的 > 编码前面加入4个空格,并把 0xe9 吃掉。 > > 我用的 debian/testing 带的python 2.3.5。 > > 现在我是这样避开这个问题的。 > In [3]: ed > IPython will make a temporary file named: /tmp/ipython_edit_a5wBIK.py > Editing...Waiting for Emacs... > done. Executing edited code... > Out[3]: "# -*- coding: utf-8; -*-\nss = '\xe9\x92\x97'\n" > In [4]: !cat /tmp/ipython_edit_a5wBIK.py > # -*- coding: utf-8; -*- > ss = '钗' > 即编辑并运行一个临时文件,在这个文件中对字符串赋值。 在我的 Debian sid 中,gnome-terminal/rxvt-unicode + python2.3.5/python2.4.3 都可以正常处理"钗"字,只有 ipython 不行。 -- Best Regards Carlos
Zeuux © 2025
京ICP备05028076号