Python论坛的帖子：

Wed Jun 29 20:23:55 HKT 2005

limodou,您好！

	我看了一下linecache模块，不过应该跟我的需要不一样，我先没有详细说明建立文本索引的背景，是这样的：我的设计是要频繁随机访问比较大的文本数据文件（可能会多达几百万甚至千万行数据），所以想建立独立的索引文件来实现对大型文本数据文件的随机访问，同时不受内存大小的限制（linecache采用readlines()读取所有行到cache是受内存限制的）
    对于下面提到的tell（）返回值多出若干'\r'字符的问题，我想如果能判断文本文件的结尾是'\n'还是'\r\n'，和运行程序的os是windows还是unix就可以人为地给tell（）返回值以修正（即减去多出来的‘\r'的数目），我认为这样可以解决这个问题.
	不知道还有没有更好的解决方案？还有就是怎么来判断每行是以'\n'还是'\r\n'结尾？
thanku

======= 2005-06-29 13:05:00 您在来信中写道：=======

>建 议你参考一下linecache模块，就是干这个用的。很方便 。
>
>在 05-6-29，amingsc<amingsc at 163.com> 写道：
>> 我是想为文本数据文件按行建立索引以实现对文本数据行的随机访问，我是这么做的：
>> 建立索引：
>> line = file.readline()
>> while(line):
>>         pos = file.tell()
>>         index.append(pos)
>>         line = file.readline()
>> 访问文件：
>> file.seek(index[row]) #row是要访问的行号
>> line = file.readline()
>> 但是这里有问题，因为这里读入的行数据和实际数据行不一致，行开头少了若干字符
>> 
>> 我的分析：
>>  数据文件是以unix风格的行结尾'\n',
>> tell（）函数返回的偏移值是按照以windows风格结尾'\r\n'转化后计算的
>> 所以tell()返回的值要比物理文件的实际位置大，大的数值就是从文件开头
>> 到当前位置的行数（每行增加了一个'\r'字符）
>> 
>> 我的问题：
>> 1.到底我的分析对不对？
>> 2.unix和windows保存文本文件到底有啥根本的差异？
>> 请大虾们指教哈，先谢过了
>> amingsc
>> 　　　　　　　　amingsc at 163.com
>> 2005-06-29
>> 
>> _______________________________________________
>> python-chinese list
>> python-chinese at lists.python.cn
>> http://python.cn/mailman/listinfo/python-chinese
>> 
>> 
>> 
>
>
>-- 
>I like python! 
>My Donews Blog: http://www.donews.net/limodou
>New Google Maillist: http://groups-beta.google.com/group/python-cn
>_______________________________________________
>python-chinese list
>python-chinese at lists.python.cn
>http://python.cn/mailman/listinfo/python-chinese
>

= = = = = = = = = = = = = = = = = = = =
			

　　　　　　　　致
礼！
 
				 
　　　　　　　　amingsc
　　　　　　　　amingsc at 163.com
　　　　　　　　　　2005-06-29

标题：Re: Re: [python-chinese] windows和unix下文本文件的tell()函数的返回值问题