Python论坛的帖子：

Mon May 1 01:08:25 HKT 2006

On 4/30/06, tocer <tocer.deng at gmail.com> wrote:
>
> 我想把 Dive  into Python 中文版的 html 由 utf-8 转换成 gb2312 编码，做了
> 个小程序。但是
> 不知道为什么有错误，大家帮忙看看。 谢谢。
>
> 错误如下：
> Traceback (most recent call last):
>   File "R:\unicode2gb.py", line 23, in ?
>     utf2gb(r'r:\test')
>   File "R:\unicode2gb.py", line 16, in utf2gb
>     gb_text = unicode_text.encode('gb2312')
> UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
> position 197: illegal multibyte sequence
>
> 源程序：
>
> #! -*- coding=utf-8 -*-
>
> import os
>
> def utf2gb(htmlpath):
>     for root, dirs, files in os.walk(htmlpath):
>         for filename in files:
>             if filename.split('.')[-1] != 'html': continue
>             filepath= '\\'.join([root,filename]) #for portable, use
> os.path.join(root, filename) instead.
>             f = open(filepath, 'r')
>             utf_text = ''.join(f.readlines())
>             f.close()
>             unicode_text = unicode(utf_text,'utf-8')
>             gb_text = unicode_text.encode('gb2312') #use 'GB18030' or
> 'GBK' instead of 'GB2312' for a larger charset
>             f = open(filepath, 'w')
>             f.write(utf_text)
>             f.close()

Actually, apart of the fileencoding convertion, You should replace html
head"CHARSET='UTF-8'" with "CHARSET='GB2312'".

if __name__ == '__main__':
>     utf2gb(r'r:\test')
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>

--
Best Regards

Shixin Zeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060501/3ab556c6/attachment.html

标题：Re: [python-chinese] 编码转换的问题