Python论坛的帖子： - 哲思

Python论坛 - 讨论区

返回群组主页

标题：[python-chinese] 编码问题：utf-8 转 GB

分享

徐继哲

楼主 2006年06月23日星期五 21:27

谢小漫 cat at ewyu.com
Fri Jun 23 21:27:35 HKT 2006

代码如下：
import urllib2

#get the file
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

file1="t1.xml"
f=opener.open("http://perfector82.spaces.msn.com/feed.rss")
tempfile=open(file1,'w')
unicode_text = unicode(f.read(),'utf-8')

gb_text = unicode_text.encode('GB18030') #use 'GB18030'
tempfile.write(gb_text)
tempfile.close()
f.close

我需要获取一个RSS文件，并转换编码到GB下，可是得到的文件内容还有连续的？？？？号。
或许是utf-8字符集比'GB18030'的大，所以就会出现这种情况，请问该如何解决。
谢谢。

-- 
花开邑大，漫步心月湖。
http://www.ewyu.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060623/40a2e6fa/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

李迎辉

0楼 2006年06月23日星期五 21:38

limodou limodou at gmail.com
Fri Jun 23 21:38:53 HKT 2006

On 6/23/06, 谢小漫 <cat at ewyu.com> wrote:
>
> 代码如下：
> import urllib2
>
> #get the file
> opener = urllib2.build_opener()
> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>
> file1="t1.xml"
> f=opener.open("http://perfector82.spaces.msn.com/feed.rss")
> tempfile=open(file1,'w')
> unicode_text = unicode(f.read(),'utf-8')
>
> gb_text = unicode_text.encode('GB18030') #use 'GB18030'
> tempfile.write(gb_text)
> tempfile.close()
> f.close
>
>
> 我需要获取一个RSS文件，并转换编码到GB下，可是得到的文件内容还有连续的？？？？号。
> 或许是utf-8字符集比'GB18030'的大，所以就会出现这种情况，请问该如何解决。
> 谢谢。
>
的确是如些。那你想如何，unicode可以让你选择是当转换失败时的处理方式，如忽略，报错或其它的，看一看文档吧。

所以最好还是使用utf-8来处理。

-- 
I like python!
My Blog: http://www.donews.net/limodou
My Django Site: http://www.djangocn.org
NewEdit Maillist: http://groups.google.com/group/NewEdit

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年06月23日星期五 22:00

shhgs shhgs.efhilt at gmail.com
Fri Jun 23 22:00:27 HKT 2006

会不会是xml编辑器的问题？

你读出的xml的declaration部分应该还是encoding="utf8"，于是编辑器就当它是utf8的来解释。

On 6/23/06, limodou <limodou at gmail.com> wrote:
> On 6/23/06, 谢小漫 <cat at ewyu.com> wrote:
> >
> > 代码如下：
> > import urllib2
> >
> > #get the file
> > opener = urllib2.build_opener()
> > opener.addheaders = [('User-agent', 'Mozilla/5.0')]
> >
> > file1="t1.xml"
> > f=opener.open("http://perfector82.spaces.msn.com/feed.rss")
> > tempfile=open(file1,'w')
> > unicode_text = unicode(f.read(),'utf-8')
> >
> > gb_text = unicode_text.encode('GB18030') #use 'GB18030'
> > tempfile.write(gb_text)
> > tempfile.close()
> > f.close
> >
> >
> > 我需要获取一个RSS文件，并转换编码到GB下，可是得到的文件内容还有连续的？？？？号。
> > 或许是utf-8字符集比'GB18030'的大，所以就会出现这种情况，请问该如何解决。
> > 谢谢。
> >
> 的确是如些。那你想如何，unicode可以让你选择是当转换失败时的处理方式，如忽略，报错或其它的，看一看文档吧。
>
> 所以最好还是使用utf-8来处理。
>
> --
> I like python!
> My Blog: http://www.donews.net/limodou
> My Django Site: http://www.djangocn.org
> NewEdit Maillist: http://groups.google.com/group/NewEdit
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年06月23日星期五 23:08

谢小漫 littlecn at gmail.com
Fri Jun 23 23:08:58 HKT 2006

>所以最好还是使用utf-8来处理。
转换没有失败，也没有报错：只是达不到预期的效果：有几个地方有连串的问号。

我想把他用在pymssql上：用utf-8就是不行。。。
怎么样直接使用utf-8字符串在pymssql的insert语句上呢？


在06-6-23，shhgs <shhgs.efhilt at gmail.com> 写道：
>
> 会不会是xml编辑器的问题？
>
> 你读出的xml的declaration部分应该还是encoding="utf8"，于是编辑器就当它是utf8的来解释。
>
> On 6/23/06, limodou <limodou at gmail.com> wrote:
> > On 6/23/06, 谢小漫 <cat at ewyu.com> wrote:
> > >
> > > 代码如下：
> > > import urllib2
> > >
> > > #get the file
> > > opener = urllib2.build_opener()
> > > opener.addheaders = [('User-agent', 'Mozilla/5.0')]
> > >
> > > file1="t1.xml"
> > > f=opener.open("http://perfector82.spaces.msn.com/feed.rss")
> > > tempfile=open(file1,'w')
> > > unicode_text = unicode(f.read(),'utf-8')
> > >
> > > gb_text = unicode_text.encode('GB18030') #use 'GB18030'
> > > tempfile.write(gb_text)
> > > tempfile.close()
> > > f.close
> > >
> > >
> > > 我需要获取一个RSS文件，并转换编码到GB下，可是得到的文件内容还有连续的？？？？号。
> > > 或许是utf-8字符集比'GB18030'的大，所以就会出现这种情况，请问该如何解决。
> > > 谢谢。
> > >
> > 的确是如些。那你想如何，unicode可以让你选择是当转换失败时的处理方式，如忽略，报错或其它的，看一看文档吧。
> >
> > 所以最好还是使用utf-8来处理。
> >
> > --
> > I like python!
> > My Blog: http://www.donews.net/limodou
> > My Django Site: http://www.djangocn.org
> > NewEdit Maillist: http://groups.google.com/group/NewEdit
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>


-- 
谢小漫
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060623/01618c15/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

李迎辉

0楼 2006年06月24日星期六 09:12

limodou limodou at gmail.com
Sat Jun 24 09:12:35 HKT 2006

On 6/23/06, 谢小漫 <littlecn at gmail.com> wrote:
>
> >所以最好还是使用utf-8来处理。
> 转换没有失败，也没有报错：只是达不到预期的效果：有几个地方有连串的问号。
>
> 我想把他用在pymssql上：用utf-8就是不行。。。
> 怎么样直接使用utf-8字符串在pymssql的insert语句上呢？
>
mssql我没有用过，不知道是否支持utf-8。不行的话还是要转。但的确utf-8与gbk可能有些转不了。可以试试gb18030，这个编码集更大一起。

-- 
I like python!
My Blog: http://www.donews.net/limodou
My Django Site: http://www.djangocn.org
NewEdit Maillist: http://groups.google.com/group/NewEdit

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年06月24日星期六 17:15

谢小漫 cat at ewyu.com
Sat Jun 24 17:15:58 HKT 2006

我尝试了gb18030，还是存在一些转换不了的字符显示为？？。
但是实际又不是?这个字符，都不知道怎么去掉。


在06-6-24，limodou <limodou at gmail.com> 写道：
>
> On 6/23/06, 谢小漫 <littlecn at gmail.com> wrote:
> >
> > >所以最好还是使用utf-8来处理。
> > 转换没有失败，也没有报错：只是达不到预期的效果：有几个地方有连串的问号。
> >
> > 我想把他用在pymssql上：用utf-8就是不行。。。
> > 怎么样直接使用utf-8字符串在pymssql的insert语句上呢？
> >
>
> mssql我没有用过，不知道是否支持utf-8。不行的话还是要转。但的确utf-8与gbk可能有些转不了。可以试试gb18030，这个编码集更大一起。
>
> --
> I like python!
> My Blog: http://www.donews.net/limodou
> My Django Site: http://www.djangocn.org
> NewEdit Maillist: http://groups.google.com/group/NewEdit
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>


-- 
花开邑大，漫步心月湖。
http://www.ewyu.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060624/bad6dcad/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

邱英波

0楼 2006年06月24日星期六 21:31

Yingbo Qiu qiuyingbo at gmail.com
Sat Jun 24 21:31:23 HKT 2006

2006/6/24, 谢小漫 <cat at ewyu.com>:
>
> 我尝试了gb18030，还是存在一些转换不了的字符显示为？？。
> 但是实际又不是?这个字符，都不知道怎么去掉。
>
win32 下 python 是调用 iconv.dll 处理的？

我知道老版本的 libiconv （包括目前的 1.10）包括的 gb18030 字符不全

其实这个年代，所有的字符集都用 UTF-8 处理吧.. 实在没有必要停留在 gb 上

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年06月26日星期一 10:17

cn-poper cn-poper at 126.com
Mon Jun 26 10:17:03 HKT 2006

你的mssql存放字段是什么类型的?
nvarchar是支持Unicode的双字节存储,varchar则不是!
  ----- Original Message ----- 
  From: 谢小漫 
  To: python-chinese at lists.python.cn 
  Sent: Friday, June 23, 2006 11:08 PM
  Subject: (瑞星提示-此邮件可能是垃圾邮件)Re: [python-chinese] 编码问题：utf-8 转 GB


  >所以最好还是使用utf-8来处理。
  转换没有失败，也没有报错：只是达不到预期的效果：有几个地方有连串的问号。

  我想把他用在pymssql上：用utf-8就是不行。。。
  怎么样直接使用utf-8字符串在pymssql的insert语句上呢？

   
  在06-6-23，shhgs <shhgs.efhilt at gmail.com> 写道： 
    会不会是xml编辑器的问题？

    你读出的xml的declaration部分应该还是encoding="utf8"，于是编辑器就当它是utf8的来解释。

    On 6/23/06, limodou < limodou at gmail.com> wrote:
    > On 6/23/06, 谢小漫 <cat at ewyu.com> wrote:
    > >
    > > 代码如下：
    > > import urllib2
    > > 
    > > #get the file
    > > opener = urllib2.build_opener()
    > > opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    > >
    > > file1="t1.xml"
    > > f=opener.open(" http://perfector82.spaces.msn.com/feed.rss")
    > > tempfile=open(file1,'w')
    > > unicode_text = unicode(f.read(),'utf-8')
    > >
    > > gb_text = unicode_text.encode('GB18030') #use 'GB18030' 
    > > tempfile.write(gb_text)
    > > tempfile.close()
    > > f.close
    > >
    > >
    > > 我需要获取一个RSS文件，并转换编码到GB下，可是得到的文件内容还有连续的？？？？号。
    > > 或许是utf-8字符集比'GB18030'的大，所以就会出现这种情况，请问该如何解决。 
    > > 谢谢。
    > >
    > 的确是如些。那你想如何，unicode可以让你选择是当转换失败时的处理方式，如忽略，报错或其它的，看一看文档吧。
    >
    > 所以最好还是使用utf-8来处理。
    >
    > --
    > I like python!
    > My Blog: http://www.donews.net/limodou
    > My Django Site: http://www.djangocn.org
    > NewEdit Maillist: http://groups.google.com/group/NewEdit 
    >
    > _______________________________________________
    > python-chinese
    > Post: send python-chinese at lists.python.cn
    > Subscribe: send subscribe to python-chinese-request at lists.python.cn
    > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn 
    > Detail Info: http://python.cn/mailman/listinfo/python-chinese
    >
    >

    _______________________________________________
    python-chinese 
    Post: send python-chinese at lists.python.cn
    Subscribe: send subscribe to python-chinese-request at lists.python.cn 
    Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
    Detail Info: http://python.cn/mailman/listinfo/python-chinese 





  -- 
  谢小漫 


------------------------------------------------------------------------------


  _______________________________________________
  python-chinese
  Post: send python-chinese at lists.python.cn
  Subscribe: send subscribe to python-chinese-request at lists.python.cn
  Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
  Detail Info: http://python.cn/mailman/listinfo/python-chinese
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060626/a307870e/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！

Zeuux © 2025

京ICP备05028076号