Python论坛  - 讨论区

标题:[python-chinese] 处理GB2321编码的�

2005年12月15日 星期四 09:40

lee yaou yaoulee at gmail.com
Thu Dec 15 09:40:35 HKT 2005

Hi All,

ÎÒÓÃpythonÀ´×¥È¡GB2312±àÂëµÄÍøҳʱºò£¬ÓÃÈçÏÂÀ´´¦Àí±àÂ룺

def handle_data(self, text):
    print text.decode('gb2312').encode('utf8')

¿ÉÊÇ´òÓ¡ÁËÒ»°ë£¬µ½ÁË"†´"µÄʱºò£¬±¨´í
cant finish getWebPage  'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

һʱ²»ÖªµÀÓÐʲô½â¾ö°ì·¨°¢£¬¸÷λÓÐʲôÏë·¨£¬Íû´Í½Ì

лл

Yaou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051215/28eb99f5/attachment.html

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 09:49

Zoom Quiet zoom.quiet at gmail.com
Thu Dec 15 09:49:10 HKT 2005

1NogMDUtMTItMTWjrGxlZSB5YW91PHlhb3VsZWVAZ21haWwuY29tPiDQtLXAo7oKPiBIaSBBbGws
Cj4KPiDO0tPDcHl0aG9uwLTXpcihR0IyMzEyseDC67XEzfjSs8qxuvKjrNPDyOfPwsC0tKbA7bHg
wuujugo+Cj4gZGVmIGhhbmRsZV9kYXRhKHNlbGYsIHRleHQpOgo+ICAgICBwcmludCB0ZXh0LmRl
Y29kZSgnZ2IyMzEyJykuZW5jb2RlKCd1dGY4JykKPgo+IL/Jyse08tOhwcvSu7Dro6y1vcHLIoa0
IrXEyrG68qOssai07QpHQjIzMTIgysfW1rLQt8+1xNfWvK+jrMnZuty24NfWtcSjoSDWsb3T16qz
yVVURjgg0rvAzdPA0t21xLzHwrywyaOhCj4gY2FudCBmaW5pc2ggZ2V0V2ViUGFnZSAgJ2diMjMx
MicgY29kZWMgY2FuJ3QgZGVjb2RlIGJ5dGVzIGluIHBvc2l0aW9uIDItMzoKPiBpbGxlZ2FsIG11
bHRpYnl0ZSBzZXF1ZW5jZQo+Cj4g0rvKsbK71qq1wNPQyrLDtL3ivvaw7LeosKKjrLj3zrvT0Mqy
w7TP67eoo6zN+7TNvcwKPgo+INC70LsKPgo+IFlhb3UKPgo+IF9fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fCj4gcHl0aG9uLWNoaW5lc2UKPiBQb3N0OiBzZW5k
IHB5dGhvbi1jaGluZXNlQGxpc3RzLnB5dGhvbi5jbgo+IFN1YnNjcmliZTogc2VuZCBzdWJzY3Jp
YmUgdG8KPiBweXRob24tY2hpbmVzZS1yZXF1ZXN0QGxpc3RzLnB5dGhvbi5jbgo+IFVuc3Vic2Ny
aWJlOiBzZW5kIHVuc3Vic2NyaWJlIHRvCj4gcHl0aG9uLWNoaW5lc2UtcmVxdWVzdEBsaXN0cy5w
eXRob24uY24KPiBEZXRhaWwgSW5mbzoKPiBodHRwOi8vcHl0aG9uLmNuL21haWxtYW4vbGlzdGlu
Zm8vcHl0aG9uLWNoaW5lc2UKPgo+CgoKLS0KIyBUaW1lIGlzIHVuaW1wb3J0YW50LCBvbmx5IGxp
ZmUgaW1wb3J0YW50IQojIyDD5rOvv6rUtKOsztLQxNfU08mjoQo=

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 09:55

Henotii henotii at gmail.com
Thu Dec 15 09:55:28 HKT 2005

IGRlZiBoYW5kbGVfZGF0YShzZWxmLCB0ZXh0KToKICAgICBwcmludCB0ZXh0LmRlY29kZSgnZ2Jr
JykuZW5jb2RlKCd1dGY4JykKCtPDZ2Jrv7S/tKOs0tG+rbD8uqzByzIzMTK1xNfWt/u8r8HLOikK
T24gMTIvMTUvMDUsIFpvb20gUXVpZXQgPHpvb20ucXVpZXRAZ21haWwuY29tPiB3cm90ZToKPiDU
2iAwNS0xMi0xNaOsbGVlIHlhb3U8eWFvdWxlZUBnbWFpbC5jb20+INC0tcCjugo+ID4gSGkgQWxs
LAo+ID4KPiA+IM7S08NweXRob27AtNelyKFHQjIzMTKx4MLrtcTN+NKzyrG68qOs08PI58/CwLS0
psDtseDC66O6Cj4gPgo+ID4gZGVmIGhhbmRsZV9kYXRhKHNlbGYsIHRleHQpOgo+ID4gICAgIHBy
aW50IHRleHQuZGVjb2RlKCdnYjIzMTInKS5lbmNvZGUoJ3V0ZjgnKQo+ID4KPiA+IL/Jyse08tOh
wcvSu7Dro6y1vcHLIoa0IrXEyrG68qOssai07Qo+IEdCMjMxMiDKx9bWstC3z7XE19a8r6Osydm6
3Lbg19a1xKOhINaxvdPXqrPJVVRGOCDSu8DN08DS3bXEvMfCvLDJo6EKPiA+IGNhbnQgZmluaXNo
IGdldFdlYlBhZ2UgICdnYjIzMTInIGNvZGVjIGNhbid0IGRlY29kZSBieXRlcyBpbiBwb3NpdGlv
biAyLTM6Cj4gPiBpbGxlZ2FsIG11bHRpYnl0ZSBzZXF1ZW5jZQo+ID4KPiA+INK7yrGyu9aqtcDT
0Mqyw7S94r72sOy3qLCio6y4986709DKssO0z+u3qKOszfu0zb3MCj4gPgo+ID4g0LvQuwo+ID4K
PiA+IFlhb3UKPiA+Cj4gPiBfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fXwo+ID4gcHl0aG9uLWNoaW5lc2UKPiA+IFBvc3Q6IHNlbmQgcHl0aG9uLWNoaW5lc2VA
bGlzdHMucHl0aG9uLmNuCj4gPiBTdWJzY3JpYmU6IHNlbmQgc3Vic2NyaWJlIHRvCj4gPiBweXRo
b24tY2hpbmVzZS1yZXF1ZXN0QGxpc3RzLnB5dGhvbi5jbgo+ID4gVW5zdWJzY3JpYmU6IHNlbmQg
dW5zdWJzY3JpYmUgdG8KPiA+IHB5dGhvbi1jaGluZXNlLXJlcXVlc3RAbGlzdHMucHl0aG9uLmNu
Cj4gPiBEZXRhaWwgSW5mbzoKPiA+IGh0dHA6Ly9weXRob24uY24vbWFpbG1hbi9saXN0aW5mby9w
eXRob24tY2hpbmVzZQo+ID4KPiA+Cj4KPgo+IC0tCj4gIyBUaW1lIGlzIHVuaW1wb3J0YW50LCBv
bmx5IGxpZmUgaW1wb3J0YW50IQo+ICMjIMPms6+/qtS0o6zO0tDE19TTyaOhCj4KPiBfX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwo+IHB5dGhvbi1jaGluZXNl
Cj4gUG9zdDogc2VuZCBweXRob24tY2hpbmVzZUBsaXN0cy5weXRob24uY24KPiBTdWJzY3JpYmU6
IHNlbmQgc3Vic2NyaWJlIHRvIHB5dGhvbi1jaGluZXNlLXJlcXVlc3RAbGlzdHMucHl0aG9uLmNu
Cj4gVW5zdWJzY3JpYmU6IHNlbmQgdW5zdWJzY3JpYmUgdG8gIHB5dGhvbi1jaGluZXNlLXJlcXVl
c3RAbGlzdHMucHl0aG9uLmNuCj4gRGV0YWlsIEluZm86IGh0dHA6Ly9weXRob24uY24vbWFpbG1h
bi9saXN0aW5mby9weXRob24tY2hpbmVzZQo+Cj4K

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 10:08

lee yaou yaoulee at gmail.com
Thu Dec 15 10:08:43 HKT 2005

GB2312ÊÇÓІ´Õâ¸ö×ֵģ¬ÎÒÊÇdecodeµÄʱºò³ö´íÁË

Ó¦¸ÃÊÇpython gb2312µÄ±àÂëÓÐÎÊÌâ°É


On 12/15/05, Zoom Quiet <zoom.quiet at gmail.com> wrote:
>
> ÔÚ 05-12-15£¬lee yaou<yaoulee at gmail.com> дµÀ£º
> > Hi All,
> >
> > ÎÒÓÃpythonÀ´×¥È¡GB2312±àÂëµÄÍøҳʱºò£¬ÓÃÈçÏÂÀ´´¦Àí±àÂ룺
> >
> > def handle_data(self, text):
> >     print text.decode('gb2312').encode('utf8')
> >
> > ¿ÉÊÇ´òÓ¡ÁËÒ»°ë£¬µ½ÁË"†´"µÄʱºò£¬±¨´í
> GB2312 ÊÇÖֲзϵÄ×Ö¼¯£¬Éٺܶà×ֵģ¡ Ö±½Óת³ÉUTF8 Ò»ÀÍÓÀÒݵļǼ°É£¡
> > cant finish getWebPage  'gb2312' codec can't decode bytes in position
> 2-3:
> > illegal multibyte sequence
> >
> > һʱ²»ÖªµÀÓÐʲô½â¾ö°ì·¨°¢£¬¸÷λÓÐʲôÏë·¨£¬Íû´Í½Ì
> >
> > лл
> >
> > Yaou
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to
> > python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request at lists.python.cn
> > Detail Info:
> > http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
>
> --
> # Time is unimportant, only life important!
> ## Ã泯¿ªÔ´£¬ÎÒÐÄ×ÔÓÉ£¡
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051215/9586e6b1/attachment.htm

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 10:12

lee yaou yaoulee at gmail.com
Thu Dec 15 10:12:55 HKT 2005

cool£¬ it works.

Many thanks.

:)

On 12/15/05, Henotii <henotii at gmail.com> wrote:
>
> def handle_data(self, text):
>      print text.decode('gbk').encode('utf8')
>
> ÓÃgbk¿´¿´£¬ÒѾ­°üº¬ÁË2312µÄ×Ö·û¼¯ÁË:)
> On 12/15/05, Zoom Quiet <zoom.quiet at gmail.com> wrote:
> > ÔÚ 05-12-15£¬lee yaou<yaoulee at gmail.com> дµÀ£º
> > > Hi All,
> > >
> > > ÎÒÓÃpythonÀ´×¥È¡GB2312±àÂëµÄÍøҳʱºò£¬ÓÃÈçÏÂÀ´´¦Àí±àÂ룺
> > >
> > > def handle_data(self, text):
> > >     print text.decode('gb2312').encode('utf8')
> > >
> > > ¿ÉÊÇ´òÓ¡ÁËÒ»°ë£¬µ½ÁË"†´"µÄʱºò£¬±¨´í
> > GB2312 ÊÇÖֲзϵÄ×Ö¼¯£¬Éٺܶà×ֵģ¡ Ö±½Óת³ÉUTF8 Ò»ÀÍÓÀÒݵļǼ°É£¡
> > > cant finish getWebPage  'gb2312' codec can't decode bytes in position
> 2-3:
> > > illegal multibyte sequence
> > >
> > > һʱ²»ÖªµÀÓÐʲô½â¾ö°ì·¨°¢£¬¸÷λÓÐʲôÏë·¨£¬Íû´Í½Ì
> > >
> > > лл
> > >
> > > Yaou
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese at lists.python.cn
> > > Subscribe: send subscribe to
> > > python-chinese-request at lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > > python-chinese-request at lists.python.cn
> > > Detail Info:
> > > http://python.cn/mailman/listinfo/python-chinese
> > >
> > >
> >
> >
> > --
> > # Time is unimportant, only life important!
> > ## Ã泯¿ªÔ´£¬ÎÒÐÄ×ÔÓÉ£¡
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051215/ccfed094/attachment-0001.html

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 10:36

Qiangning Hong hongqn at gmail.com
Thu Dec 15 10:36:07 HKT 2005

T24gMTIvMTUvMDUsIGxlZSB5YW91IDx5YW91bGVlQGdtYWlsLmNvbT4gd3JvdGU6Cj4gR0IyMzEy
ysfT0Ia01eK49tfWtcSjrM7SysdkZWNvZGW1xMqxuvKz9rTtwcsKPgo+INOmuMPKx3B5dGhvbiBn
YjIzMTK1xLHgwuvT0M7KzOKwyQoKhrSyu9TaR0IyMzEywO8KCiQgZWNobyCGtCB8IGljb252IC10
IGdiMjMxMgppY29udjogaWxsZWdhbCBpbnB1dCBzZXF1ZW5jZSBhdCBwb3NpdGlvbiAwCgotLQpR
aWFuZ25pbmcgSG9uZwpodHRwOi8vaG9uZ3FuLmhuLm9yZwpSZWdpc3RlcmVkIExpbnV4IFVzZXIg
IzM5Njk5Ngo=

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 14:37

Yskin yskins at gmail.com
Thu Dec 15 14:37:27 HKT 2005

这个字应该不在GB2312里吧,否则陶喆也不用老是被写成陶吉吉了

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 14:41

lannos sini caocancan at gmail.com
Thu Dec 15 14:41:36 HKT 2005

GB2312有这个字的。

2005/12/15, Yskin <yskins at gmail.com>:
>
> 这个字应该不在GB2312里吧,否则陶喆也不用老是被写成陶吉吉了
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>


--
以上,祝工作顺利,生活顺心。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051215/8aa4087e/attachment.htm

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2005年12月15日 星期四 14:47

july july july.lzu at gmail.com
Thu Dec 15 14:47:22 HKT 2005

Èç¹ûÍøÒ³ÖгöÏÖÁË"†´"¹À¼ÆÍøÒ³µÄ±àÂë²»ÊÇgb2312µÄ°É
ÊÔÊÔ´ÓmetaÀïÃæ°ÑÍøÒ³µÄ±àÂëÕÒ³öÀ´
def do_meta (self,attrs):
     for attr in attrs:
            for i in attr:
                if "charset=" in i:
                    self.encode = i.split('=')[1]
                    print 'coding of the pages is',self.encode
È»ºóÓÃÕâ¸ö±àÂëÈ¥decode£¬È»ºóÔÙencode
def handle_data(self, text):
     print text.decode(self.encode).encode('utf-8')
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051215/55794d97/attachment.html

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号