Python论坛的帖子： - 哲思

Python论坛 - 讨论区

返回群组主页

标题：[python-chinese] 困扰我一个多月的问题：python读取vc写的二进制结构文件

分享

徐继哲

楼主 2006年12月05日星期二 09:03

yang haijun veldtwolf在gmail.com
星期二十二月 5 09:03:57 HKT 2006

我碰到的问题是这样的：
用vc写的二进制文件，内容是多条结构记录的文件，结构大致如下：
struct POI
｛
WCHAR  wchPtName[12];
double  dLongitude;
double  dLatitude;
｝；

注意这个wchPtName字段，是采用VC中Unicode编码存储的，而不是通常的ANSI，内容是汉字。

我的代码大致如下：
import struct
fp = open('poi.dat', 'rb')

fmt = '8sdd'
count = struct.calcsize(fmt)

rec = fp.read(count)

pyrec = struct.unpack(fmt, rec)

然后显示pyrec内容是乱的，如果将wchPtName改成Ansi编码，就没有问题了，
我想可能需要编码转换吧，但是没有转换成功。

要求是：不能转换wchPtName为Ansi编码，也不使用python的c/c++扩展方式读取这个poi.dat文件。
如果不使用struct模块读取，还有其它的模块能读取unicode编码的数据吗？
请大家帮忙看看。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061205/b76a780e/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

0楼 2006年12月05日星期二 09:06

刘鑫 march.liu在gmail.com
星期二十二月 5 09:06:39 HKT 2006

¶ÁÍêÒÔºóÓÃunicode.decode(wchPtName, "utf-16")½âÂëÊÔÊÔ¿´£¬²»ÐÐµÄ»°ÊÔÊÔutf-8¡¢mbcs»ògbk¡£

ÔÚ06-12-5£¬yang haijun <veldtwolf在gmail.com> Ð´µÀ£º
>
> ÎÒÅöµ½µÄÎÊÌâÊÇÕâÑùµÄ£º
> ÓÃvcÐ´µÄ¶þ½øÖÆÎÄ¼þ£¬ÄÚÈÝÊÇ¶àÌõ½á¹¹¼ÇÂ¼µÄÎÄ¼þ£¬½á¹¹´óÖÂÈçÏÂ£º
> struct POI
> £û
> WCHAR  wchPtName[12];
> double  dLongitude;
> double  dLatitude;
> £ý£»
>
> ×¢ÒâÕâ¸öwchPtName×Ö¶Î£¬ÊÇ²ÉÓÃVCÖÐUnicode±àÂë´æ´¢µÄ£¬¶ø²»ÊÇÍ¨³£µÄANSI£¬ÄÚÈÝÊÇºº×Ö¡£
>
> ÎÒµÄ´úÂë´óÖÂÈçÏÂ£º
> import struct
> fp = open('poi.dat', 'rb')
>
> fmt = '8sdd'
> count = struct.calcsize(fmt)
>
> rec = fp.read(count)
>
> pyrec = struct.unpack(fmt, rec)
>
> È»ºóÏÔÊ¾pyrecÄÚÈÝÊÇÂÒµÄ£¬Èç¹û½«wchPtName¸Ä³ÉAnsi±àÂë£¬¾ÍÃ»ÓÐÎÊÌâÁË£¬
> ÎÒÏë¿ÉÄÜÐèÒª±àÂë×ª»»°É£¬µ«ÊÇÃ»ÓÐ×ª»»³É¹¦¡£
>
> ÒªÇóÊÇ£º²»ÄÜ×ª»»wchPtNameÎªAnsi±àÂë£¬Ò²²»Ê¹ÓÃpythonµÄc/c++À©Õ¹·½Ê½¶ÁÈ¡Õâ¸öpoi.datÎÄ¼þ¡£
> Èç¹û²»Ê¹ÓÃstructÄ£¿é¶ÁÈ¡£¬»¹ÓÐÆäËüµÄÄ£¿éÄÜ¶ÁÈ¡unicode±àÂëµÄÊý¾ÝÂð£¿
> Çë´ó¼Ò°ïÃ¦¿´¿´¡£
>
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>



-- 
»¶Ó·ÃÎÊ£º
http://blog.csdn.net/ccat

ÁõöÎ
March.Liu
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20061205/65fb4117/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年12月05日星期二 12:17

yang haijun veldtwolf在gmail.com
星期二十二月 5 12:17:59 HKT 2006

不行啊，附件是一个例子文件，只有一条记录。
能帮我读出来看看吗
另外，struct模块中参数没有unicode的参数，'s'和'p'都是char的，
还有其它模块可以读取unicode编码的二进制文件吗



在06-12-5，刘鑫 <march.liu at gmail.com> 写道：
>
> 读完以后用unicode.decode(wchPtName, "utf-16")解码试试看，不行的话试试utf-8、mbcs或gbk。
>
> 在06-12-5，yang haijun <veldtwolf at gmail.com > 写道：
> >
> > 我碰到的问题是这样的：
> > 用vc写的二进制文件，内容是多条结构记录的文件，结构大致如下：
> > struct POI
> > ｛
> > WCHAR  wchPtName[12];
> > double  dLongitude;
> > double  dLatitude;
> > ｝；
> >
> > 注意这个wchPtName字段，是采用VC中Unicode编码存储的，而不是通常的ANSI，内容是汉字。
> >
> > 我的代码大致如下：
> > import struct
> > fp = open('poi.dat', 'rb')
> >
> > fmt = '8sdd'
> > count = struct.calcsize(fmt)
> >
> > rec = fp.read(count)
> >
> > pyrec = struct.unpack(fmt, rec)
> >
> > 然后显示pyrec内容是乱的，如果将wchPtName改成Ansi编码，就没有问题了，
> > 我想可能需要编码转换吧，但是没有转换成功。
> >
> > 要求是：不能转换wchPtName为Ansi编码，也不使用python的c/c++扩展方式读取这个poi.dat文件。
> > 如果不使用struct模块读取，还有其它的模块能读取unicode编码的数据吗？
> > 请大家帮忙看看。
> >
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request at lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
>
> --
> 欢迎访问：
> http://blog.csdn.net/ccat
>
> 刘鑫
> March.Liu
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061205/aac323dc/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poi1.dat
Type: application/octet-stream
Size: 40 bytes
Desc: not available
Url : http://python.cn/pipermail/python-chinese/attachments/20061205/aac323dc/attachment.obj

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年12月05日星期二 13:08

Leira Hua lhua在altigen.com.cn
星期二十二月 5 13:08:39 HKT 2006

1. 12个宽字符是 24s，C字符串以0结尾。
2. 字符串是'utf-16le'编码的。
3. 这个文件的纪录是\r\n结尾的吧？

import struct

fp = open('poi1.dat')
rec = fp.readline()

fmt = '24sdd'
pyrec = struct.unpack(fmt, rec)

name = unicode(pyrec[0].split('\x00\x00')[0], 'utf-16le')

print name.encode('gbk'), pyrec[1], pyrec[2]



读取成功，内容为： 甘家口大厦 116.1313 39.12345

On Tue, 05 Dec 2006 12:17:59 +0800, yang haijun  
<veldtwolf at gmail.com> wrote:

> 不行啊，附件是一个例子文件，只有一条记录。
> 能帮我读出来看看吗
> 另外，struct模块中参数没有unicode的参数，'s'和'p'都是char的，
> 还有其它模块可以读取unicode编码的二进制文件吗
>
>
>
> 在06-12-5，刘鑫 <march.liu at gmail.com> 写道：
>>
>> 读完以后用unicode.decode(wchPtName, "utf-16")解码试试看，不行的话试试 
>> utf-8、mbcs或gbk。
>>
>> 在06-12-5，yang haijun <veldtwolf at gmail.com > 写道：
>> >
>> > 我碰到的问题是这样的：
>> > 用vc写的二进制文件，内容是多条结构记录的文件，结构大致如下：
>> > struct POI
>> > ｛
>> > WCHAR  wchPtName[12];
>> > double  dLongitude;
>> > double  dLatitude;
>> > ｝；
>> >
>> > 注意这个wchPtName字段，是采用VC中Unicode编码存储的，而不是通常的ANSI， 
>> 内容是汉字。
>> >
>> > 我的代码大致如下：
>> > import struct
>> > fp = open('poi.dat', 'rb')
>> >
>> > fmt = '8sdd'
>> > count = struct.calcsize(fmt)
>> >
>> > rec = fp.read(count)
>> >
>> > pyrec = struct.unpack(fmt, rec)
>> >
>> > 然后显示pyrec内容是乱的，如果将wchPtName改成Ansi编码，就没有问题了，
>> > 我想可能需要编码转换吧，但是没有转换成功。
>> >
>> > 要求是：不能转换wchPtName为Ansi编码，也不使用python的c/c++扩展方式读取 
>> 这个poi.dat文件。
>> > 如果不使用struct模块读取，还有其它的模块能读取unicode编码的数据吗？
>> > 请大家帮忙看看。
>> >
>> >
>> >
>> > _______________________________________________
>> > python-chinese
>> > Post: send python-chinese at lists.python.cn
>> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
>> > Unsubscribe: send unsubscribe to
>> > python-chinese-request at lists.python.cn
>> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
>> >
>>
>>
>>
>> --
>> 欢迎访问：
>> http://blog.csdn.net/ccat
>>
>> 刘鑫
>> March.Liu
>>
>> _______________________________________________
>> python-chinese
>> Post: send python-chinese at lists.python.cn
>> Subscribe: send subscribe to python-chinese-request at lists.python.cn
>> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>>



-- 
Leira Hua
http://my.opera.com/Leira

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

孙君意

0楼 2006年12月05日星期二 13:14

junyi sun ccnusjy在gmail.com
星期二十二月 5 13:14:49 HKT 2006

>>> f.seek(0,0)
>>> first=f.read(12)
>>> print first.decode('utf-16')
甘家口大厦

On 12/5/06, yang haijun <veldtwolf在gmail.com> wrote:
> 不行啊，附件是一个例子文件，只有一条记录。
> 能帮我读出来看看吗
> 另外，struct模块中参数没有unicode的参数，'s'和'p'都是char的，
> 还有其它模块可以读取unicode编码的二进制文件吗
>
>
>
> 在06-12-5，刘鑫 < march.liu在gmail.com> 写道：
> > 读完以后用unicode.decode(wchPtName,
> "utf-16")解码试试看，不行的话试试utf-8、mbcs或gbk。
> >
> >
> > 在06-12-5，yang haijun <veldtwolf在gmail.com > 写道：
> > >
> > > 我碰到的问题是这样的：
> > > 用vc写的二进制文件，内容是多条结构记录的文件，结构大致如下：
> > > struct POI
> > > ｛
> > > WCHAR  wchPtName[12];
> > > double  dLongitude;
> > > double  dLatitude;
> > > ｝；
> > >
> > > 注意这个wchPtName字段，是采用VC中Unicode编码存储的，而不是通常的ANSI，内容是汉字。
> > >
> > > 我的代码大致如下：
> > > import struct
> > > fp = open('poi.dat', 'rb')
> > >
> > > fmt = '8sdd'
> > > count = struct.calcsize(fmt)
> > >
> > > rec = fp.read(count)
> > >
> > > pyrec = struct.unpack(fmt, rec)
> > >
> > > 然后显示pyrec内容是乱的，如果将wchPtName改成Ansi编码，就没有问题了，
> > > 我想可能需要编码转换吧，但是没有转换成功。
> > >
> > >
> 要求是：不能转换wchPtName为Ansi编码，也不使用python的c/c++扩展方式读取这个poi.dat文件。
> > > 如果不使用struct模块读取，还有其它的模块能读取unicode编码的数据吗？
> > > 请大家帮忙看看。
> > >
> > >
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese在lists.python.cn
> > > Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> > > Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> > > Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
> > >
> >
> >
> >
> > --
> > 欢迎访问：
> > http://blog.csdn.net/ccat
> >
> > 刘鑫
> > March.Liu
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> > Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
> >
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>
>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年12月05日星期二 13:44

cun heise cunheise在hotmail.com
星期二十二月 5 13:44:34 HKT 2006

#!/usr/bin/python
import sys, struct
reload(sys)
sys.setdefaultencoding('utf-8')

fp = open('poi1.dat', 'rb')
fmt = '8sdd'
count = struct.calcsize(fmt)
rec = fp.read(count)
pyrec = struct.unpack(fmt, rec)
for i in pyrec:
  print i

这样可不可以

_________________________________________________________________
与世界各地的朋友进行交流，免费下载  Live Messenger; 
http://get.live.com/messenger/overview

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年12月05日星期二 14:18

yang haijun veldtwolf在gmail.com
星期二十二月 5 14:18:06 HKT 2006

正确的。
谢谢楼上的两位。
因为初学python，原来还以为python不适合处理二进制数据呢，
看来还是自己的功力太差，需要奋起直追了。：）

2006/12/5, Leira Hua <lhua at altigen.com.cn>:
>
> 1. 12个宽字符是 24s，C字符串以0结尾。
> 2. 字符串是'utf-16le'编码的。
> 3. 这个文件的纪录是\r\n结尾的吧？
>
> import struct
>
> fp = open('poi1.dat')
> rec = fp.readline()
>
> fmt = '24sdd'
> pyrec = struct.unpack(fmt, rec)
>
> name = unicode(pyrec[0].split('\x00\x00')[0], 'utf-16le')
>
> print name.encode('gbk'), pyrec[1], pyrec[2]
>
>
>
> 读取成功，内容为： 甘家口大厦 116.1313 39.12345
>
> On Tue, 05 Dec 2006 12:17:59 +0800, yang haijun
> <veldtwolf at gmail.com> wrote:
>
> > 不行啊，附件是一个例子文件，只有一条记录。
> > 能帮我读出来看看吗
> > 另外，struct模块中参数没有unicode的参数，'s'和'p'都是char的，
> > 还有其它模块可以读取unicode编码的二进制文件吗
> >
> >
> >
> > 在06-12-5，刘鑫 <march.liu at gmail.com> 写道：
> >>
> >> 读完以后用unicode.decode(wchPtName, "utf-16")解码试试看，不行的话试试
> >> utf-8、mbcs或gbk。
> >>
> >> 在06-12-5，yang haijun <veldtwolf at gmail.com > 写道：
> >> >
> >> > 我碰到的问题是这样的：
> >> > 用vc写的二进制文件，内容是多条结构记录的文件，结构大致如下：
> >> > struct POI
> >> > ｛
> >> > WCHAR  wchPtName[12];
> >> > double  dLongitude;
> >> > double  dLatitude;
> >> > ｝；
> >> >
> >> > 注意这个wchPtName字段，是采用VC中Unicode编码存储的，而不是通常的ANSI，
> >> 内容是汉字。
> >> >
> >> > 我的代码大致如下：
> >> > import struct
> >> > fp = open('poi.dat', 'rb')
> >> >
> >> > fmt = '8sdd'
> >> > count = struct.calcsize(fmt)
> >> >
> >> > rec = fp.read(count)
> >> >
> >> > pyrec = struct.unpack(fmt, rec)
> >> >
> >> > 然后显示pyrec内容是乱的，如果将wchPtName改成Ansi编码，就没有问题了，
> >> > 我想可能需要编码转换吧，但是没有转换成功。
> >> >
> >> > 要求是：不能转换wchPtName为Ansi编码，也不使用python的c/c++扩展方式读取
> >> 这个poi.dat文件。
> >> > 如果不使用struct模块读取，还有其它的模块能读取unicode编码的数据吗？
> >> > 请大家帮忙看看。
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > python-chinese
> >> > Post: send python-chinese at lists.python.cn
> >> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> >> > Unsubscribe: send unsubscribe to
> >> > python-chinese-request at lists.python.cn
> >> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >> >
> >>
> >>
> >>
> >> --
> >> 欢迎访问：
> >> http://blog.csdn.net/ccat
> >>
> >> 刘鑫
> >> March.Liu
> >>
> >> _______________________________________________
> >> python-chinese
> >> Post: send python-chinese at lists.python.cn
> >> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> >> Unsubscribe: send unsubscribe to
> python-chinese-request at lists.python.cn
> >> Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >>
>
>
>
> --
> Leira Hua
> http://my.opera.com/Leira
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061205/1ea8897b/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

李海东

0楼 2006年12月05日星期二 18:36

hydon hydonlee在gmail.com
星期二十二月 5 18:36:25 HKT 2006

新手学习哈....

不明白为什么你的文件中会有12个字节的: 0xCC...0xCC
应该是你的结构少东西了吧.

#!/usr/bin/python
import struct

fp = open('poi1.dat', 'rb')
fmt = '12s12sdd'
count = struct.calcsize(fmt)
rec = fp.read(count)
pyrec = struct.unpack(fmt, rec)
print pyrec[0].decode('utf-16'), pyrec[2], pyrec[3]

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！

Zeuux © 2025

京ICP备05028076号