Python论坛的帖子： - 哲思

Python论坛 - 讨论区

返回群组主页

标题：[python-chinese] 读取超大文件的最后一行或者多行

分享

李维刚

楼主 2005年12月22日星期四 17:25

Weigang LI dimens at gmail.com
Thu Dec 22 17:25:58 HKT 2005

各位好，
请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
由于文件非常大，顺序读取非常的耗时，怎样实现效率高？

谢谢。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051222/4946b7ca/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月27日星期二 13:43

bu shehui bushehui at gmail.com
Tue Dec 27 13:43:14 HKT 2005

If you use Linux, you can use the command such as

          tail -n 1  #the last  line


good luck

2005-12-27


On 12/22/05, Weigang LI <dimens at gmail.com> wrote:
>
> 各位好，
> 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
>
> 谢谢。
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051227/20cbffef/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

曹灿灿

0楼 2005年12月27日星期二 14:40

lannos sini caocancan at gmail.com
Tue Dec 27 14:40:11 HKT 2005

我也想知道啊，这个问题困扰我多时了。

在05-12-22，Weigang LI <dimens at gmail.com> 写道：
>
> 各位好，
> 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
>
> 谢谢。
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>


--
以上，祝工作顺利，生活顺心。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051227/66a8b80b/attachment-0001.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月27日星期二 15:36

Zarz zarz at tom.com
Tue Dec 27 15:36:09 HKT 2005

*lix下的tail命令是怎么做的呢? (用它查看多大的文件的最后部分都很快)
　　

======== 2005-12-27 15:17:32 您在来信中写道： ========

我也想知道啊，这个问题困扰我多时了。


在05-12-22，Weigang LI <dimens at gmail.com> 写道： 
各位好，
请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
由于文件非常大，顺序读取非常的耗时，怎样实现效率高？

谢谢。

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn 
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to   python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese





-- 
以上，祝工作顺利，生活顺心。 

= = = = = = = = = = = = = = = = = = = = = = 
　　　　　　　　致
礼！

　　　　　　　　　　　　　　Zarz
　　　　　　　　　　　　　　zarz at tom.com
　　　　　　　　　　　　　　　2005-12-27
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051227/5a2cfe5e/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月27日星期二 16:54

hutuworm hutuworm at gmail.com
Tue Dec 27 16:54:03 HKT 2005

/* Print the last N_LINES lines from the end of file FD.
   Go backward through the file, reading `BUFSIZ' bytes at a time (except
   probably the first), until we hit the start of the file or have
   read NUMBER newlines.
   START_POS is the starting position of the read pointer for the file
   associated with FD (may be nonzero).
   END_POS is the file offset of EOF (one larger than offset of last byte).
   Return true if successful.  */

static bool
file_lines (const char *pretty_filename, int fd, uintmax_t n_lines,
            off_t start_pos, off_t end_pos, uintmax_t *read_pos)
{
  char buffer[BUFSIZ];
  size_t bytes_read;
  off_t pos = end_pos;

  if (n_lines == 0)
    return true;

  /* Set `bytes_read' to the size of the last, probably partial, buffer;
     0 < `bytes_read' <= `BUFSIZ'.  */
  bytes_read = (pos - start_pos) % BUFSIZ;
  if (bytes_read == 0)
    bytes_read = BUFSIZ;
  /* Make `pos' a multiple of `BUFSIZ' (0 if the file is short), so that all
     reads will be on block boundaries, which might increase efficiency.  */
  pos -= bytes_read;
  xlseek (fd, pos, SEEK_SET, pretty_filename);
  bytes_read = safe_read (fd, buffer, bytes_read);
  if (bytes_read == SAFE_READ_ERROR)
    {
      error (0, errno, _("error reading %s"), quote (pretty_filename));
      return false;
    }
  *read_pos = pos + bytes_read;

  /* Count the incomplete line on files that don't end with a newline.  */
  if (bytes_read && buffer[bytes_read - 1] != '\n')
    --n_lines;

  do
    {
      /* Scan backward, counting the newlines in this bufferfull.  */

      size_t n = bytes_read;
      while (n)
        {
          char const *nl;
          nl = memrchr (buffer, '\n', n);
          if (nl == NULL)
            break;
          n = nl - buffer;
          if (n_lines-- == 0)
            {
              /* If this newline isn't the last character in the buffer,
                 output the part that is after it.  */
              if (n != bytes_read - 1)
                xwrite_stdout (nl + 1, bytes_read - (n + 1));
              *read_pos += dump_remainder (pretty_filename, fd,
                                           end_pos - (pos + bytes_read));
              return true;
            }
        }

      /* Not enough newlines in that bufferfull.  */
      if (pos == start_pos)
        {
          /* Not enough lines in the file; print everything from
             start_pos to the end.  */
          xlseek (fd, start_pos, SEEK_SET, pretty_filename);
          *read_pos = start_pos + dump_remainder (pretty_filename, fd,
                                                  end_pos);
          return true;
        }
      pos -= BUFSIZ;
      xlseek (fd, pos, SEEK_SET, pretty_filename);

      bytes_read = safe_read (fd, buffer, BUFSIZ);
      if (bytes_read == SAFE_READ_ERROR)
        {
          error (0, errno, _("error reading %s"), quote (pretty_filename));
          return false;
        }

      *read_pos = pos + bytes_read;
    }
  while (bytes_read > 0);

  return true;
}

On 12/27/05, Zarz <zarz at tom.com> wrote:
>
>  *lix下的tail命令是怎么做的呢? (用它查看多大的文件的最后部分都很快)
>
> ======== 2005-12-27 15:17:32 您在来信中写道： ========
>
>
> 我也想知道啊，这个问题困扰我多时了。
>
> 在05-12-22，Weigang LI <dimens at gmail.com> 写道：
> >
> > 各位好，
> > 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> > 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
> >
> > 谢谢。
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request at lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
>
> --
> 以上，祝工作顺利，生活顺心。
>
>  = = = = = = = = = = = = = = = = = = = = = =
>
> 致
> 礼！
>
>  Zarz
>  zarz at tom.com
>  2005-12-27
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>


--
In doG We Trust
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051227/954e9a63/attachment-0001.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月27日星期二 18:45

hoxide Ma hoxide at gmail.com
Tue Dec 27 18:45:08 HKT 2005

用mmap

在 05-12-27，bu shehui<bushehui at gmail.com> 写道：
> If you use Linux, you can use the command such as
>
>           tail -n 1  #the last  line
>
>
> good luck
>
> 2005-12-27
>
>
>
> On 12/22/05, Weigang LI <dimens at gmail.com> wrote:
> >
> >
> > 各位好，
> > 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> > 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
> >
> > 谢谢。
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to
> python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to
> python-chinese-request at lists.python.cn
> > Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to
> python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to
> python-chinese-request at lists.python.cn
> Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>
>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月28日星期三 09:50

Vincent Wen vincentwen at gmail.com
Wed Dec 28 09:50:11 HKT 2005

python 也可以seek啊，seek到文件尾，然后从后往前找回车符


在 05-12-27，hoxide Ma<hoxide at gmail.com> 写道：
> 用mmap
>
> 在 05-12-27，bu shehui<bushehui at gmail.com> 写道：
> > If you use Linux, you can use the command such as
> >
> >           tail -n 1  #the last  line
> >
> >
> > good luck
> >
> > 2005-12-27
> >
> >
> >
> > On 12/22/05, Weigang LI <dimens at gmail.com> wrote:
> > >
> > >
> > > 各位好，
> > > 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> > > 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
> > >
> > > 谢谢。
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese at lists.python.cn
> > > Subscribe: send subscribe to
> > python-chinese-request at lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > python-chinese-request at lists.python.cn
> > > Detail Info:
> > http://python.cn/mailman/listinfo/python-chinese
> > >
> > >
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to
> > python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request at lists.python.cn
> > Detail Info:
> > http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月28日星期三 16:17

Kevin Yuan farproc at gmail.com
Wed Dec 28 16:17:45 HKT 2005

#last lines
def last_lines(filename, lines = 1):
    #print the last several line(s) of a text file
    """
    Argument filename is the name of the file to print.
    Argument lines is the number of lines to print from last.
    """
    block_size = 1024
    block = ''
    nl_count = 0
    start = 0
    fsock = file(filename, 'rU')
    try:
        #seek to end
        fsock.seek(0, 2)
        #get seek position
        curpos = fsock.tell()
        while(curpos > 0): #while not BOF
            #seek ahead block_size+the length of last read block
            curpos -= (block_size + len(block));
            if curpos < 0: curpos = 0
            fsock.seek(curpos)
            #read to end
            block = fsock.read()
            nl_count = block.count('\n')
            #if read enough(more)
            if nl_count >= lines: break
        #get the exact start position
        for n in range(nl_count-lines+1):
            start = block.find('\n', start)+1
    finally:
        fsock.close()
    #print it out
    print block[start:]

if __name__ == '__main__':
    import sys
    last_lines(sys.argv[0], 5) #print the last 5 lines of THIS file

在05-12-28，Vincent Wen <vincentwen at gmail.com> 写道：
>
> python 也可以seek啊，seek到文件尾，然后从后往前找回车符
>
>
> 在 05-12-27，hoxide Ma<hoxide at gmail.com> 写道：
> > 用mmap
> >
> > 在 05-12-27，bu shehui<bushehui at gmail.com> 写道：
> > > If you use Linux, you can use the command such as
> > >
> > >           tail -n 1  #the last  line
> > >
> > >
> > > good luck
> > >
> > > 2005-12-27
> > >
> > >
> > >
> > > On 12/22/05, Weigang LI <dimens at gmail.com> wrote:
> > > >
> > > >
> > > > 各位好，
> > > > 请问用什么样的方法读取一个超大文件的最后一行，或者文件末尾的n行。
> > > > 由于文件非常大，顺序读取非常的耗时，怎样实现效率高？
> > > >
> > > > 谢谢。
> > > > _______________________________________________
> > > > python-chinese
> > > > Post: send python-chinese at lists.python.cn
> > > > Subscribe: send subscribe to
> > > python-chinese-request at lists.python.cn
> > > > Unsubscribe: send unsubscribe to
> > > python-chinese-request at lists.python.cn
> > > > Detail Info:
> > > http://python.cn/mailman/listinfo/python-chinese
> > > >
> > > >
> > >
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese at lists.python.cn
> > > Subscribe: send subscribe to
> > > python-chinese-request at lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > > python-chinese-request at lists.python.cn
> > > Detail Info:
> > > http://python.cn/mailman/listinfo/python-chinese
> > >
> > >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese at lists.python.cn
> > Subscribe: send subscribe to python-chinese-request at lists.python.cn
> > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051228/be1917e0/attachment-0001.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2005年12月28日星期三 18:16

jejwe jejwester at gmail.com
Wed Dec 28 18:16:39 HKT 2005

php 有很多文本操作类，国内的几个PHP文本论坛已经可以讲把文本操作发挥到了极致^_^，所以你可以借鉴一下。都有很好的思想。我找这篇文章给你看


《*解决 textdb 核心问题:超负载与稳定性*》

http://www.phpchina.cn/bbs/viewthread.php?tid=788&page;=2

从四楼看
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20051228/9403be03/attachment-0001.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！

Zeuux © 2025

京ICP备05028076号