Python论坛  - 讨论区

标题:Re: [python-chinese] 编码转换的问题

2006年05月01日 星期一 01:08

Shixin Zeng zeng.shixin at gmail.com
Mon May 1 01:08:25 HKT 2006

On 4/30/06, tocer <tocer.deng at gmail.com> wrote:
>
> 我想把 Dive  into Python 中文版的 html 由 utf-8 转换成 gb2312 编码,做了
> 个小程序。但是
> 不知道为什么有错误,大家帮忙看看。 谢谢。
>
> 错误如下:
> Traceback (most recent call last):
>   File "R:\unicode2gb.py", line 23, in ?
>     utf2gb(r'r:\test')
>   File "R:\unicode2gb.py", line 16, in utf2gb
>     gb_text = unicode_text.encode('gb2312')
> UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
> position 197: illegal multibyte sequence
>
> 源程序:
>
> #! -*- coding=utf-8 -*-
>
> import os
>
> def utf2gb(htmlpath):
>     for root, dirs, files in os.walk(htmlpath):
>         for filename in files:
>             if filename.split('.')[-1] != 'html': continue
>             filepath= '\\'.join([root,filename]) #for portable, use
> os.path.join(root, filename) instead.
>             f = open(filepath, 'r')
>             utf_text = ''.join(f.readlines())
>             f.close()
>             unicode_text = unicode(utf_text,'utf-8')
>             gb_text = unicode_text.encode('gb2312') #use 'GB18030' or
> 'GBK' instead of 'GB2312' for a larger charset
>             f = open(filepath, 'w')
>             f.write(utf_text)
>             f.close()


Actually, apart of the fileencoding convertion, You should replace html
head"CHARSET='UTF-8'" with "CHARSET='GB2312'".

if __name__ == '__main__':
>     utf2gb(r'r:\test')
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>



--
Best Regards

Shixin Zeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060501/3ab556c6/attachment.html

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2006年05月02日 星期二 00:33

tocer tocer.deng at gmail.com
Tue May 2 00:33:10 HKT 2006

谢谢,你说得对,我已经搞定了

Shixin Zeng 写道:
 >
 >
 > On 4/30/06, *tocer* <tocer.deng at gmail.com tocer.deng at gmail.com>>
 > wrote:
 >
 >     我想把 Dive  into Python 中文版的 html 由 utf-8 转换成 gb2312 编 
码,做了
 >     个小程序。但是
 >     不知道为什么有错误,大家帮忙看看。 谢谢。
 >
 >     错误如下:
 >     Traceback (most recent call last):
 >       File "R:\unicode2gb.py", line 23, in ?
 >         utf2gb(r'r:\test')
 >       File "R:\unicode2gb.py", line 16, in utf2gb
 >         gb_text = unicode_text.encode('gb2312')
 >     UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
 >     position 197: illegal multibyte sequence
 >
 >     源程序:
 >
 >     #! -*- coding=utf-8 -*-
 >
 >     import os
 >
 >     def utf2gb(htmlpath):
 >         for root, dirs, files in os.walk(htmlpath):
 >             for filename in files:
 >                 if filename.split('.')[-1] != 'html': continue
 >                 filepath= '\\'.join([root,filename]) #for portable, use
 >     os.path.join(root, filename) instead.
 >                 f = open(filepath, 'r')
 >                 utf_text = ''.join( f.readlines())
 >                 f.close()
 >                 unicode_text = unicode(utf_text,'utf-8')
 >                 gb_text = unicode_text.encode('gb2312') #use 'GB18030'
 >     or 'GBK' instead of 'GB2312' for a larger charset
 >                 f = open(filepath, 'w')
 >                 f.write(utf_text)
 >                 f.close()
 >
 >
 > Actually, apart of the fileencoding convertion, You should replace html
 > head"CHARSET='UTF-8'" with "CHARSET='GB2312'".
 >
 >     if __name__ == '__main__':
 >         utf2gb(r'r:\test')

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号