Python论坛的帖子：

Python论坛 - 讨论区

标题：[python-chinese] 如何知道文本文件的编码

楼主 2006年12月25日星期一 16:41

john john.about在gmail.com
星期一十二月 25 16:41:55 HKT 2006

>
> 一堆文本文件和程序的源码文件，我要一一读出替换里面的内容，我怎么判断文件是什么编码的呢？而且关键是替换后，还要保持原编码写回去。这个步骤如何作？
>
> --
> 武长斌
> chbin.w at gmail.com


import codecs
for x in ['utf8', 'utf16', ...]: # 列出所有python支持的编码
  try:
    f = codecs.open('filename', 'rb', x)
    print x
  except:
    continue

知道了是用什么编码的再decode, encode。
这样可以么？
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061225/3aa9bd86/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

林旺茂

0楼 2006年12月25日星期一 16:45

3751 lwm3751在gmail.com
星期一十二月 25 16:45:48 HKT 2006

不可行

在06-12-25，john <john.about at gmail.com> 写道：
>
> 一堆文本文件和程序的源码文件，我要一一读出替换里面的内容，我怎么判断文件是什么编码的呢？而且关键是替换后，还要保持原编码写回去。这个步骤如何作？
> >
> > --
> > 武长斌
> > chbin.w at gmail.com
>
>
> import codecs
> for x in ['utf8', 'utf16', ...]: # 列出所有python支持的编码
>   try:
>     f = codecs.open('filename', 'rb', x)
>     print x
>   except:
>     continue
>
> 知道了是用什么编码的再decode, encode。
> 这样可以么？
>
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061225/14b233a7/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

李迎辉

0楼 2006年12月25日星期一 16:57

limodou limodou在gmail.com
星期一十二月 25 16:57:41 HKT 2006

On 12/25/06, john <john.about在gmail.com> wrote:
> >
> 一堆文本文件和程序的源码文件，我要一一读出替换里面的内容，我怎么判断文件是什么编码的呢？而且关键是替换后，还要保持原编码写回去。这个步骤如何作？
> >
> > --
> > 武长斌
> > chbin.w在gmail.com
>
>
> import codecs
> for x in ['utf8', 'utf16', ...]: # 列出所有python支持的编码
>   try:
>     f = codecs.open('filename', 'rb', x)
>     print x
>   except:
>     continue
>
> 知道了是用什么编码的再decode, encode。
> 这样可以么？
>
>
3751不是给出了一个模块嘛，怎么不用。

-- 
I like python!
UliPad <>: http://wiki.woodpecker.org.cn/moin/UliPad
My Blog: http://www.donews.net/limodou

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年12月25日星期一 20:38

yetist wu2xiaotian在gmail.com
星期一十二月 25 20:38:56 HKT 2006

有个chardet模块可用。
http://chardet.feedparser.org/docs/usage.html
在 2006-12-25一的 16:57 +0800，limodou写道：
> On 12/25/06, john <john.about at gmail.com> wrote:
> > >
> > 一堆文本文件和程序的源码文件，我要一一读出替换里面的内容，我怎么判断文件是什么编码的呢？而且关键是替换后，还要保持原编码写回去。这个步骤如何作？
> > >
> > > --
> > > 武长斌
> > > chbin.w at gmail.com
> >
> >
> > import codecs
> > for x in ['utf8', 'utf16', ...]: # 列出所有python支持的编码
> >   try:
> >     f = codecs.open('filename', 'rb', x)
> >     print x
> >   except:
> >     continue
> >
> > 知道了是用什么编码的再decode, encode。
> > 这样可以么？
> >
> >
> 3751不是给出了一个模块嘛，怎么不用。
> 
> -- 
> I like python!
> UliPad <>: http://wiki.woodpecker.org.cn/moin/UliPad
> My Blog: http://www.donews.net/limodou
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！