Python论坛  - 讨论区

标题:[python-chinese] PDF 库?

2007年01月01日 星期一 13:10

Beinan Li li.beinan在gmail.com
星期一 一月 1 13:10:24 HKT 2007

寻完整读写PDF包括图形标注对象的库? 能兼容最新版格式的最好.
多谢

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年01月03日 星期三 08:07

gashero harry.python在gmail.com
星期三 一月 3 08:07:42 HKT 2007

邓作霖 	
<pse-dengzl at pegasus.tj.cn> 致 python-chinese
	 更多选项	  06-4-28
大家好:

    最近项目中需要从PDF中提取文本,我的C基础不好,使用Adobe PDF的SDK时
有很大的障碍,时间也不允许在这方面花费太多时间,不知道python有没有操作
PDF文件的库,主要的功能就是要求提取文本内容,不要求格式,有谁知道吗?多
谢了。
<http://codeplayer.blogbus.com/>


_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese


回复	转发	邀请 邓作霖 使用 Gmail
	
		
		
yi huang 	
致 python-chinese
	 更多选项	  06-4-28
一搜一堆  -_-!

- 显示引用文字 -
On 4/28/06, 邓作霖 <pse-dengzl at pegasus.tj.cn> wrote:

    - 显示引用文字 -
    大家好:

         最近项目中需要从PDF中提取文本,我的C基础不好,使用Adobe PDF的SDK时
    有很大的障碍,时间也不允许在这方面花费太多时间,不知道python有没有操作
    PDF文件的库,主要的功能就是要求提取文本内容,不要求格式,有谁知道吗?多
    谢了。
    <http://codeplayer.blogbus.com/ >


    _______________________________________________
    python-chinese
    Post: send python-chinese at lists.python.cn
    Subscribe: send subscribe to python-chinese-request at lists.python.cn
    Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
    Detail Info: http://python.cn/mailman/listinfo/python-chinese

    _______________________________________________
    python-chinese
    Post: send python-chinese at lists.python.cn
    Subscribe: send subscribe to python-chinese-request at lists.python.cn
    Unsubscribe: send unsubscribe to   python-chinese-request at lists.python.cn
    Detail Info: http://python.cn/mailman/listinfo/python-chinese




-- 
http://codeplayer.blogbus.com/
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese


回复	转发	yi.codeplayer 无法聊天
	
		
		
tocer 	
<tocer.deng at gmail.com> 致 python-chinese
	 更多选项	  06-4-28
#  PDFlib是支持 PDF 的库,并且绑定至数个开发语言,包括 C、C++、C#、
Java、Perl、PHP、Python、RPG、Tcl 等等。

# ReportLab的几个文档相关的产品中包括用于操作 PDF 的 ReportLab 工具箱。

还有,新问题请单独发送,不要跟在人家后面使用恢复的形式:)



邓作霖 写道:
- 显示引用文字 -
> 大家好:
>
>     最近项目中需要从PDF中提取文本,我的C基础不好,使用Adobe PDF的SDK时
> 有很大的障碍,时间也不允许在这方面花费太多时间,不知道python有没有操作
> PDF文件的库,主要的功能就是要求提取文本内容,不要求格式,有谁知道吗?多
> 谢了。
> <http://codeplayer.blogbus.com/>
>
>
> ------------------------------------------------------------------------
- 显示引用文字 -
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

回复	转发	邀请 tocer 聊天
	
		
		
邓作霖 	
<pse-dengzl at pegasus.tj.cn> 致 python-chinese
	 更多选项	  06-4-28
PDFlib好像不是免费的,Reportlab我看了他们开源的工具里面没法读取pdf,如果
想要这个功能得使用pdfcatcher,这是个商业软件,我再找找看吧,非常感谢。:)
- 显示引用文字 -

tocer wrote:
> #  PDFlib是支持 PDF 的库,并且绑定至数个开发语言,包括 C、C++、C#、
> Java、Perl、PHP、Python、RPG、Tcl 等等。
>
> # ReportLab的几个文档相关的产品中包括用于操作 PDF 的 ReportLab 工具箱。
>
> 还有,新问题请单独发送,不要跟在人家后面使用恢复的形式:)
>
>
>
> 邓作霖 写道:
>> 大家好:
>>
>>     最近项目中需要从PDF中提取文本,我的C基础不好,使用Adobe PDF的SDK
>> 时有很大的障碍,时间也不允许在这方面花费太多时间,不知道python有没有
>> 操作 PDF文件的库,主要的功能就是要求提取文本内容,不要求格式,有谁知
>> 道吗?多 谢了。
>> <http://codeplayer.blogbus.com/>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> python-chinese
>> Post: send python-chinese at lists.python.cn
>> Subscribe: send subscribe to python-chinese-request at lists.python.cn
>> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> python-chinese
>> Post: send python-chinese at lists.python.cn
>> Subscribe: send subscribe to python-chinese-request at lists.python.cn
>> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>> Detail Info: http://python.cn/mailman/listinfo/python-chinese
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

回复	转发	邀请 邓作霖 使用 Gmail
	
		
		
Jerry 	
<jetport at gmail.com> 致 python-chinese
	 更多选项	  06-4-28
我也在找PDF文本提取方面的工具,不过我要求表格,图片也要提取出来,如果你只要文本可以试试Xpdf,The Xpdf project
also includes a PDF text extractor


-- 
If U can see it, then U can do it
If U just believe it, there's nothing to it
I believe U can fly
From Jetport at gmail.com


_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese


回复	转发	邀请 Jerry 聊天
	
		
		
jacob 	
<jacob at exoweb.net> 致 python-chinese
	 更多选项	  06-4-28
xpdf里面有一个pdftotext工具.
http://www.foolabs.com/xpdf/download.html
- 显示引用文字 -
邓作霖 wrote:

> 大家好:
>
>     最近项目中需要从PDF中提取文本,我的C基础不好,使用Adobe PDF的SDK时
> 有很大的障碍,时间也不允许在这方面花费太多时间,不知道python有没有操作
> PDF文件的库,主要的功能就是要求提取文本内容,不要求格式,有谁知道吗?
> 多 谢了。
> <http://codeplayer.blogbus.com/>
>
- 显示引用文字 -
>------------------------------------------------------------------------
>
>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

回复	转发	邀请 jacob 使用 Gmail
	
		
		
邓作霖 	
<pse-dengzl at pegasus.tj.cn> 致 python-chinese
	 更多选项	  06-4-29
我原来是想用python来批量导出PDF,因为主程序是用Delphi做的,XPdf这样的工
具也不错,用Shell直接调用就可以了,非常感谢。
也非常感谢jacob!

Jerry wrote:
> 我也在找PDF文本提取方面的工具,不过我要求表格,图片也要提取出来,如果你只
> 要文本可以试试Xpdf,The Xpdf project also includes a PDF text extractor
>
>
> --
> If U can see it, then U can do it
> If U just believe it, there's nothing to it
> I believe U can fly
> From Jetport at gmail.com Jetport at gmail.com>
- 显示引用文字 -
> ------------------------------------------------------------------------
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

回复	转发	邀请 邓作霖 使用 Gmail
	
		
		
bird devdoer 	
<devdoer at gmail.com> 致 python-chinese
	 更多选项	  06-5-1
xpdf中文支持怎么样

在06-4-29,邓作霖 <pse-dengzl at pegasus.tj.cn> 写道:
- 显示引用文字 -

    我原来是想用python来批量导出PDF,因为主程序是用Delphi做的,XPdf这样的工
    具也不错,用Shell直接调用就可以了,非常感谢。
    也非常感谢jacob!

    Jerry wrote:
    > 我也在找PDF文本提取方面的工具,不过我要求表格,图片也要提取出来,如果你只
    > 要文本可以试试Xpdf,The Xpdf project also includes a PDF text extractor
    >
    >
    > --
    > If U can see it, then U can do it
    > If U just believe it, there's nothing to it
    > I believe U can fly
    > From Jetport at gmail.com Jetport at gmail.com>
    > ------------------------------------------------------------------------
    >
    > _______________________________________________
    > python-chinese
    > Post: send python-chinese at lists.python.cn
    > Subscribe: send subscribe to python-chinese-request at lists.python.cn
    > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
    > Detail Info: http://python.cn/mailman/listinfo/python-chinese
    _______________________________________________
    python-chinese
    Post: send python-chinese at lists.python.cn
    Subscribe: send subscribe to python-chinese-request at lists.python.cn
    Unsubscribe: send unsubscribe to   python-chinese-request at lists.python.cn
    Detail Info: http://python.cn/mailman/listinfo/python-chinese




-- 
devdoer
devdoer at gmail.com
http://project.mytianwang.cn/cgi-bin/blog
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese


回复	转发	邀请 bird 聊天
	
		
		
邓作霖 	
<pse-dengzl at pegasus.tj.cn> 致 python-chinese
	 更多选项	  06-5-8
不好意思五一休假的时候没有看邮件。
我没有试验导出中文,只试验了导出日文的PDF,但是没有成功,所有的日文字符
都没法导出,看说明文件里要设置相关的字体才行,时间比较少我也没仔细研究这
个字体怎么设置。

bird devdoer wrote:
> xpdf中文支持怎么样
>
> 在06-4-29,*邓作霖* <pse-dengzl at pegasus.tj.cn
> pse-dengzl at pegasus.tj.cn>> 写道:
>
>     我原来是想用python来批量导出PDF,因为主程序是用Delphi做的,XPdf这
>     样的工
>     具也不错,用Shell直接调用就可以了,非常感谢。
>     也非常感谢jacob!
>
>     Jerry wrote:
>     > 我也在找PDF文本提取方面的工具,不过我要求表格,图片也要提取出来,如
>     果你只
>     > 要文本可以试试Xpdf,The Xpdf project also includes a PDF text
>     extractor
>     >
>     >
>     > --
>     > If U can see it, then U can do it
>     > If U just believe it, there's nothing to it
>     > I believe U can fly
>     > From Jetport at gmail.com Jetport at gmail.com>
>     Jetport at gmail.com Jetport at gmail.com>>
>     >
>     ------------------------------------------------------------------------
>
>     >
>     > _______________________________________________
>     > python-chinese
>     > Post: send python-chinese at lists.python.cn
>     python-chinese at lists.python.cn>
>     > Subscribe: send subscribe to
>     python-chinese-request at lists.python.cn
>     python-chinese-request at lists.python.cn>
>     > Unsubscribe: send unsubscribe to
>     python-chinese-request at lists.python.cn
>     python-chinese-request at lists.python.cn>
>     > Detail Info: http://python.cn/mailman/listinfo/python-chinese
>     _______________________________________________
>     python-chinese
>     Post: send python-chinese at lists.python.cn
>     python-chinese at lists.python.cn>
>     Subscribe: send subscribe to
>     python-chinese-request at lists.python.cn
>     python-chinese-request at lists.python.cn>
>     Unsubscribe: send unsubscribe to
>     python-chinese-request at lists.python.cn
>     python-chinese-request at lists.python.cn>
>     Detail Info: http://python.cn/mailman/listinfo/python-chinese
>     <http://python.cn/mailman/listinfo/python-chinese>
>
>
>
>
> --
> devdoer
> devdoer at gmail.com devdoer at gmail.com>
> http://project.mytianwang.cn/cgi-bin/blog
> <http://project.mytianwang.cn/cgi-bin/blog>
> ------------------------------------------------------------------------
- 显示引用文字 -
>
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

回复	转发	邀请 邓作霖 使用 Gmail
	
		
		
bird devdoer 	
<devdoer at gmail.com> 致 python-chinese
	 更多选项	  06-5-8
Thanks

在06-5-8,邓作霖 <pse-dengzl at pegasus.tj.cn> 写道:
- 显示引用文字 -

    不好意思五一休假的时候没有看邮件。
    我没有试验导出中文,只试验了导出日文的PDF,但是没有成功,所有的日文字符
    都没法导出,看说明文件里要设置相关的字体才行,时间比较少我也没仔细研究这
    个字体怎么设置。

    bird devdoer wrote:
    > xpdf中文支持怎么样
    >
    > 在06-4-29,*邓作霖* <pse-dengzl at pegasus.tj.cn
    > pse-dengzl at pegasus.tj.cn >> 写道:
    >
    >     我原来是想用python来批量导出PDF,因为主程序是用Delphi做的,XPdf这
    >     样的工
    >     具也不错,用Shell直接调用就可以了,非常感谢。
    >     也非常感谢jacob!
    >
    >     Jerry wrote:
    >     > 我也在找PDF文本提取方面的工具,不过我要求表格,图片也要提取出来,如
    >     果你只
    >     > 要文本可以试试Xpdf,The Xpdf project also includes a PDF text
    >     extractor
    >     >
    >     >
    >     > --
    >     > If U can see it, then U can do it
    >     > If U just believe it, there's nothing to it
    >     > I believe U can fly
    >     > From Jetport at gmail.com Jetport at gmail.com>
    >     Jetport at gmail.com Jetport at gmail.com>>
    >     >
    >     ------------------------------------------------------------------------
    >
    >     >
    >     > _______________________________________________
    >     > python-chinese
    >     > Post: send python-chinese at lists.python.cn
    >     python-chinese at lists.python.cn>
    >     > Subscribe: send subscribe to
    >     python-chinese-request at lists.python.cn
    >     python-chinese-request at lists.python.cn>
    >     > Unsubscribe: send unsubscribe to
    >     python-chinese-request at lists.python.cn
    >     python-chinese-request at lists.python.cn>
    >     > Detail Info: http://python.cn/mailman/listinfo/python-chinese
    >     _______________________________________________
    >     python-chinese
    >     Post: send python-chinese at lists.python.cn
    >     python-chinese at lists.python.cn>
    >     Subscribe: send subscribe to
    >     python-chinese-request at lists.python.cn
    >     python-chinese-request at lists.python.cn>
    >     Unsubscribe: send unsubscribe to
    >     python-chinese-request at lists.python.cn
    >     python-chinese-request at lists.python.cn>
    >     Detail Info: http://python.cn/mailman/listinfo/python-chinese
    >     <http://python.cn/mailman/listinfo/python-chinese >
    >
    >
    >
    >
    > --
    > devdoer
    > devdoer at gmail.com devdoer at gmail.com>
    > http://project.mytianwang.cn/cgi-bin/blog
    > <http://project.mytianwang.cn/cgi-bin/blog>
    > ------------------------------------------------------------------------
    >
    > _______________________________________________
    > python-chinese
    > Post: send python-chinese at lists.python.cn
    > Subscribe: send subscribe to python-chinese-request at lists.python.cn
    > Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
    > Detail Info: http://python.cn/mailman/listinfo/python-chinese
    _______________________________________________
    python-chinese
    Post: send python-chinese at lists.python.cn
    Subscribe: send subscribe to python-chinese-request at lists.python.cn
    Unsubscribe: send unsubscribe to   python-chinese-request at lists.python.cn
    Detail Info: http://python.cn/mailman/listinfo/python-chinese




-- 
devdoer
- 显示引用文字 -

devdoer at gmail.com
http://project.mytianwang.cn/cgi-bin/blog

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese


2007/1/1, Beinan Li <li.beinan at gmail.com>:
> 寻完整读写PDF包括图形标注对象的库? 能兼容最新版格式的最好.
> 多谢
> _______________________________________________
> python-chinese
> Post: send python-chinese at lists.python.cn
> Subscribe: send subscribe to python-chinese-request at lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese


-- 
从前有一只很冷的毛毛虫,他想获得一点温暖。而获得温暖的机会只有从树上掉下来,落进别人的领口。
片刻的温暖,之后便失去生命。而很多同类却连这片刻的温暖都没有得到就..
我会得到温暖么?小心翼翼的尝试,却还是会受到伤害。
我愿为那一刻的温暖去拼,可是谁愿意接受?

欢迎访问偶的博客:
http://blog.csdn.net/gashero

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年01月03日 星期三 20:37

3751 lwm3751在gmail.com
星期三 一月 3 20:37:09 HKT 2007

这邮件叫人怎么看?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20070103/4c65f466/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年01月03日 星期三 21:05

shawind killzero在sohu.com
星期三 一月 3 21:05:44 HKT 2007

·´ÕýÊÇÎı¾Ã´£¬×ª³ÉGB2312¸ñʽ¾Í¿ÉÒÔ¿´ÁË¡£
¡¡¡¡

======== 2007-01-03 20:57:26 ÄúÔÚÀ´ÐÅÖÐдµÀ£º ========

ÕâÓʼþ½ÐÈËÔõô¿´£¿


= = = = = = = = = = = = = = = = = = = = = = 
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070103/0e4b8127/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年01月03日 星期三 23:30

3751 lwm3751在gmail.com
星期三 一月 3 23:30:37 HKT 2007

我指的是gashero信里面那么长的一堆内容,看完人都会疯掉,讲什么就更不知道了……
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20070103/97f68fb3/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号