2007年01月23日 星期二 23:36
if __name__ == "__main__": print "¿ªÊ¼......" conn = httplib.HTTPConnection('www.baidu.com') conn.request("GET","/index.html") response = conn.getresponse() html=response.read() conn.close() print html ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà html=response.read().encode('gbk') ½á¹û£¬ÔËÐÐʱ´íÎó UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124: ordinal not in range(128) ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070123/55e69caf/attachment.htm
2007年01月23日 星期二 23:48
html=response.read().decode('gbk') »òÕß html=response.read().decode('utf-8') ÄãÊÔÒ»ÊÔ°É On 1/23/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com> wrote: > > if __name__ == "__main__": > print "¿ªÊ¼......" > conn = httplib.HTTPConnection('www.baidu.com') > conn.request("GET","/index.html") > response = conn.getresponse() > html=response.read() > conn.close() > print html > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > html=response.read().encode('gbk') > ½á¹û£¬ÔËÐÐʱ´íÎó > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124: > ordinal not in range(128) > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070123/9a9fe461/attachment.html
2007年01月24日 星期三 14:28
html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128) html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: unexpected code byte On 1/23/07, junyi sun <ccnusjy在gmail.com> wrote: > > html=response.read().decode('gbk') > »òÕß > html=response.read().decode('utf-8') > ÄãÊÔÒ»ÊÔ°É > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > if __name__ == "__main__": > > print "¿ªÊ¼......" > > conn = httplib.HTTPConnection('www.baidu.com') > > conn.request("GET","/index.html") > > response = conn.getresponse() > > html=response.read() > > conn.close() > > print html > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > html=response.read().encode('gbk') > > ½á¹û£¬ÔËÐÐʱ´íÎó > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position > > 124: ordinal not in range(128) > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/0a16dbb7/attachment.htm
2007年01月24日 星期三 15:46
试试看这个: import httplib if __name__ == "__main__": print "开始......" conn = httplib.HTTPConnection('www.baidu.com') conn.request("GET","/index.html") response = conn.getresponse() html=response.read() html = unicode(html, 'gb2312') conn.close() print html ps: 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote: > > html=response.read().decode('gbk')提示如下错误: > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 0-13: ordinal not in range(128) > > html=response.read().decode('utf-8')提示如下错误: > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: > unexpected code byte > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > html=response.read().decode('gbk') > > 或者 > > html= response.read().decode('utf-8') > > 你试一试吧 > > > > > > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > if __name__ == "__main__": > > > print "开始......" > > > conn = httplib.HTTPConnection('www.baidu.com') > > > conn.request("GET","/index.html") > > > response = conn.getresponse() > > > html=response.read() > > > conn.close() > > > print html > > > > > > 打印html,里面中文显示为乱码。我也尝试过使用 > > > html=response.read().encode('gbk') > > > 结果,运行时错误 > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position > > > 124: ordinal not in range(128) > > > > > > 请问这是什么原因呢? > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -- Best Regards, Archer Ming Zhe Huang -------------- 下一部分 -------------- 一个HTML附件被移除... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/85fd65c4/attachment.html
2007年01月24日 星期三 16:26
Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º ¿ªÊ¼...... Traceback (most recent call last): File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ? print html UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-32: ordinal not in range(128) »¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > ÊÔÊÔ¿´Õâ¸ö£º > import httplib > > if __name__ == "__main__": > print "¿ªÊ¼......" > conn = httplib.HTTPConnection('www.baidu.com') > conn.request("GET","/index.html") > response = conn.getresponse() > html=response.read() > html = unicode(html, 'gb2312') > conn.close() > print html > > ps: > ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£ > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > 0-13: ordinal not in range(128) > > > > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: > > unexpected code byte > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > html=response.read().decode('gbk') > > > »òÕß > > > html= response.read().decode('utf-8') > > > ÄãÊÔÒ»ÊÔ°É > > > > > > > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > if __name__ == "__main__": > > > > print "¿ªÊ¼......" > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > conn.request("GET","/index.html") > > > > response = conn.getresponse() > > > > html=response.read() > > > > conn.close() > > > > print html > > > > > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > > > html=response.read().encode('gbk') > > > > ½á¹û£¬ÔËÐÐʱ´íÎó > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position > > > > 124: ordinal not in range(128) > > > > > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > -- > Best Regards, > > Archer > > Ming Zhe Huang > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/49c50183/attachment.htm
2007年01月24日 星期三 17:08
那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote: > > 奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是: > 开始...... > Traceback (most recent call last): > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ? > print html > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 19-32: ordinal not in range(128) > > 还是编码问题,难道和我系统有关? 我使用的是Ubuntu > > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > > > 试试看这个: > > import httplib > > > > if __name__ == "__main__": > > print "开始......" > > conn = httplib.HTTPConnection(' www.baidu.com') > > conn.request("GET","/index.html") > > response = conn.getresponse() > > html=response.read() > > html = unicode(html, 'gb2312') > > conn.close() > > print html > > > > ps: > > 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > html=response.read().decode('gbk')提示如下错误: > > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > > 0-13: ordinal not in range(128) > > > > > > html=response.read().decode('utf-8')提示如下错误: > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: > > > unexpected code byte > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > html=response.read().decode('gbk') > > > > 或者 > > > > html= response.read().decode('utf-8') > > > > 你试一试吧 > > > > > > > > > > > > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > if __name__ == "__main__": > > > > > print "开始......" > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > conn.request("GET","/index.html") > > > > > response = conn.getresponse() > > > > > html=response.read() > > > > > conn.close() > > > > > print html > > > > > > > > > > 打印html,里面中文显示为乱码。我也尝试过使用 > > > > > html=response.read().encode('gbk') > > > > > 结果,运行时错误 > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > position 124: ordinal not in range(128) > > > > > > > > > > 请问这是什么原因呢? > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > -- > > Best Regards, > > > > Archer > > > > Ming Zhe Huang > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -- Best Regards, Archer Ming Zhe Huang -------------- 下一部分 -------------- 一个HTML附件被移除... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/343bf3fe/attachment.html
2007年01月24日 星期三 22:13
encodingÊÇgb2312¡£¿ÉÊÇΪɶ²»ÄÜÕý³£ÏÔʾÄØ£¿ On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > ÄÇÄã¿ÉÒÔ¿´¿´ÂÒÂëµÄhtmlÒ³ÃæÀïÃæµÄheadÉϵÄencodingÊÇʲô°É£¿¿ÉÄÜubuntuÉϲ»ÊÇgb2312,gbk > > On 1/24/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com> wrote: > > > > Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º > > ¿ªÊ¼...... > > Traceback (most recent call last): > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in > > ? > > print html > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > 19-32: ordinal not in range(128) > > > > »¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > ÊÔÊÔ¿´Õâ¸ö£º > > > import httplib > > > > > > if __name__ == "__main__": > > > print "¿ªÊ¼......" > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > conn.request("GET","/index.html") > > > response = conn.getresponse() > > > html=response.read() > > > html = unicode(html, 'gb2312') > > > conn.close() > > > print html > > > > > > ps: > > > ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£ > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > position 0-13: ordinal not in range(128) > > > > > > > > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position > > > > 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > html=response.read().decode('gbk') > > > > > »òÕß > > > > > html= response.read().decode('utf-8') > > > > > ÄãÊÔÒ»ÊÔ°É > > > > > > > > > > > > > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > if __name__ == "__main__": > > > > > > print "¿ªÊ¼......" > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > conn.request("GET","/index.html") > > > > > > response = conn.getresponse() > > > > > > html=response.read() > > > > > > conn.close() > > > > > > print html > > > > > > > > > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > > > > > html=response.read().encode('gbk') > > > > > > ½á¹û£¬ÔËÐÐʱ´íÎó > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > -- > > > Best Regards, > > > > > > Archer > > > > > > Ming Zhe Huang > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > -- > Best Regards, > > Archer > > Ming Zhe Huang > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/21fdeedd/attachment-0001.htm
2007年01月24日 星期三 22:40
使用html = unicode(html, 'gb2312')也不行? 那可能是ubuntu的环境没设置好吧,特别是在console下。 On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote: > > encoding是gb2312。可是为啥不能正常显示呢? > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > > > 那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > 奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是: > > > 开始...... > > > Traceback (most recent call last): > > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, > > > in ? > > > print html > > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > > 19-32: ordinal not in range(128) > > > > > > 还是编码问题,难道和我系统有关? 我使用的是Ubuntu > > > > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > > > 试试看这个: > > > > import httplib > > > > > > > > if __name__ == "__main__": > > > > print "开始......" > > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > > conn.request("GET","/index.html") > > > > response = conn.getresponse() > > > > html=response.read() > > > > html = unicode(html, 'gb2312') > > > > conn.close() > > > > print html > > > > > > > > ps: > > > > 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 > > > > > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > html=response.read().decode('gbk')提示如下错误: > > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > > position 0-13: ordinal not in range(128) > > > > > > > > > > html=response.read().decode('utf-8')提示如下错误: > > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in > > > > > position 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > > > html=response.read().decode('gbk') > > > > > > 或者 > > > > > > html= response.read().decode('utf-8') > > > > > > 你试一试吧 > > > > > > > > > > > > > > > > > > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > > if __name__ == "__main__": > > > > > > > print "开始......" > > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > > conn.request("GET","/index.html") > > > > > > > response = conn.getresponse() > > > > > > > html=response.read() > > > > > > > conn.close() > > > > > > > print html > > > > > > > > > > > > > > 打印html,里面中文显示为乱码。我也尝试过使用 > > > > > > > html=response.read().encode('gbk') > > > > > > > 结果,运行时错误 > > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > > > 请问这是什么原因呢? > > > > > > > > > > > > > > _______________________________________________ > > > > > > > python-chinese > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > Subscribe: send subscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards, > > > > > > > > Archer > > > > > > > > Ming Zhe Huang > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > -- > > Best Regards, > > > > Archer > > > > Ming Zhe Huang > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -- Best Regards, Archer Ming Zhe Huang -------------- 下一部分 -------------- 一个HTML附件被移除... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/093c047c/attachment.html
2007年01月25日 星期四 08:16
Mingzhe Huang,您好! html = unicode(html, 'gb2312').encode('utf8') ======== 2007-01-24 22:41:27 您在来信中写道: ======== 使用html = unicode(html, 'gb2312')也不行? 那可能是ubuntu的环境没设置好吧,特别是在console下。 On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com > wrote: encoding是gb2312。可是为啥不能正常显示呢? On 1/24/07, Mingzhe Huang <archerzz在gmail.com > wrote: 那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: 奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是: 开始...... Traceback (most recent call last): File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ? print html UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-32: ordinal not in range(128) 还是编码问题,难道和我系统有关? 我使用的是Ubuntu On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: 试试看这个: import httplib if __name__ == "__main__": print "开始......" conn = httplib.HTTPConnection(' www.baidu.com') conn.request("GET","/index.html") response = conn.getresponse() html=response.read() html = unicode(html, 'gb2312') conn.close() print html ps: 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: html=response.read().decode('gbk')提示如下错误: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128) html=response.read().decode('utf-8')提示如下错误: UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: unexpected code byte On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: html=response.read().decode('gbk') 或者 html= response.read().decode('utf-8') 你试一试吧 On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: if __name__ == "__main__": print "开始......" conn = httplib.HTTPConnection('www.baidu.com') conn.request("GET","/index.html") response = conn.getresponse() html=response.read() conn.close() print html 打印html,里面中文显示为乱码。我也尝试过使用 html=response.read().encode('gbk') 结果,运行时错误 UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124: ordinal not in range(128) 请问这是什么原因呢? _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese -- Best Regards, Archer Ming Zhe Huang _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese -- Best Regards, Archer Ming Zhe Huang _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese _______________________________________________ python-chinese Post: send python-chinese在lists.python.cn Subscribe: send subscribe to python-chinese-request在lists.python.cn Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn Detail Info: http://python.cn/mailman/listinfo/python-chinese -- Best Regards, Archer Ming Zhe Huang = = = = = = = = = = = = = = = = = = = = = = 致 礼! charles huang hyy在fjii.com 2007-01-25
2007年01月25日 星期四 11:36
html=response.read().decode('utf-8').encode('gbk') -------------- ä¸ä¸é¨å -------------- ??HTML?????... URL: http://python.cn/pipermail/python-chinese/attachments/20070125/695c0f2a/attachment-0001.html
2007年01月25日 星期四 11:37
ÎÒ³¢ÊÔÁËÏ£¬ÔÚconsoleÖÐÔËÐеĻ°£¬Ò»ÇÐÕý³£ÁË£¨²ÉÓÃgb2312£©£¬ÖÐÎÄÏÔʾûÓÐÎÊÌ⣬µ«ÔÚEclipseÖÐʼÖÕ»á³öÀ´ÄǸöÎÊÌâ¡£²»µÃÆä½â¡£ ÁíÍâÎÒ»¹ÏëÎÊÏ£¬ÎÒÓÐÒ»¸ö³¬Á´½ÓµØÖ· http://localhost/mybook">Êé¼® ÎÒ¸ÃÈçºÎͬʱȡ³öÕâ¸öÁ´½ÓµØÖ·ºÍÊé¼®´æ·ÅÔÚÒ»¸ö×Öµä½á¹¹ÖÐÄØ£¿ Ö±½ÓÓÃSGMLParseÀàµÄstart_a()ÄÜʵÏÖô£¿ ÎÒ¿ÉÒԵõ½µØÖ·£¬µ«"Êé¼®"ÔõôµÃµ½ÄØ£¿ def start_a(self,attr): url = [value for (key,value) in attrs] del url[len(url)-1] if name: self.urls.append(url) ÔÚÕâ¸ö·½·¨Öд¦Àíô£¿ »¹ÊÇÐèÒªÔÚ handle_data()Öд¦ÀíÄØ£¿ On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > ʹÓÃhtml = unicode(html, 'gb2312')Ò²²»ÐУ¿ > ÄÇ¿ÉÄÜÊÇubuntuµÄ»·¾³Ã»ÉèÖúðɣ¬ÌرðÊÇÔÚconsoleÏ¡£ > > On 1/24/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com > wrote: > > > > encodingÊÇgb2312¡£¿ÉÊÇΪɶ²»ÄÜÕý³£ÏÔʾÄØ£¿ > > > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com > wrote: > > > > > > ÄÇÄã¿ÉÒÔ¿´¿´ÂÒÂëµÄhtmlÒ³ÃæÀïÃæµÄheadÉϵÄencodingÊÇʲô°É£¿¿ÉÄÜubuntuÉϲ»ÊÇgb2312,gbk > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º > > > > ¿ªÊ¼...... > > > > Traceback (most recent call last): > > > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, > > > > in ? > > > > print html > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > position 19-32: ordinal not in range(128) > > > > > > > > »¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu > > > > > > > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > > > > > ÊÔÊÔ¿´Õâ¸ö£º > > > > > import httplib > > > > > > > > > > if __name__ == "__main__": > > > > > print "¿ªÊ¼......" > > > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > > > conn.request("GET","/index.html") > > > > > response = conn.getresponse() > > > > > html=response.read() > > > > > html = unicode(html, 'gb2312') > > > > > conn.close() > > > > > print html > > > > > > > > > > ps: > > > > > ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£ > > > > > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º > > > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > > > position 0-13: ordinal not in range(128) > > > > > > > > > > > > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º > > > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in > > > > > > position 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > > > > > html=response.read().decode('gbk') > > > > > > > »òÕß > > > > > > > html= response.read().decode('utf-8') > > > > > > > ÄãÊÔÒ»ÊÔ°É > > > > > > > > > > > > > > > > > > > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > > > > if __name__ == "__main__": > > > > > > > > print "¿ªÊ¼......" > > > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > > > conn.request("GET","/index.html") > > > > > > > > response = conn.getresponse() > > > > > > > > html=response.read() > > > > > > > > conn.close() > > > > > > > > print html > > > > > > > > > > > > > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > > > > > > > html=response.read().encode('gbk') > > > > > > > > ½á¹û£¬ÔËÐÐʱ´íÎó > > > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > python-chinese > > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > > Subscribe: send subscribe to > > > > > > > > python-chinese-request在lists.python.cn > > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > > python-chinese-request在lists.python.cn > > > > > > > > Detail Info: > > > > > > > > http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > python-chinese > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > Subscribe: send subscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards, > > > > > > > > > > Archer > > > > > > > > > > Ming Zhe Huang > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > -- > > > Best Regards, > > > > > > Archer > > > > > > Ming Zhe Huang > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > -- > Best Regards, > > Archer > > Ming Zhe Huang > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070125/139f55b0/attachment-0001.html
Zeuux © 2025
京ICP备05028076号