Python论坛的帖子：

Python论坛 - 讨论区

标题：[python-chinese] 如何在windows环境用urllib模块正确下载文件

楼主 2007年10月10日星期三 13:45

bakefish yellowfool在gmail.com
星期三十月 10 13:45:45 HKT 2007

cookbook2ÖÐµÄ*Downloading a File from the Web*

#!/usr/bin/env python

"""File downloading from the web.
"""

def download(url):
	"""Copy the contents of a file from a given URL
	to a local file.
	"""
	import urllib
	webFile = urllib.urlopen(url)
	localFile = open(url.split('/')[-1], 'w')
	localFile.write(webFile.read())
	webFile.close()
	localFile.close()

if __name__ == '__main__':
	import sys
	if len(sys.argv) == 2:
		try:
			download(sys.argv[1])
		except IOError:
			print 'Filename not found.'
	else:
		import os
		print 'usage: %s http://server.com/path/to/filename' %
os.path.basename(sys.argv[0])

linuxÏÂÊ¹ÓÃÃ»ÓÐÎÊÌâ£¬µ«ÊÇÔÚwindowsÏÂÊ¹ÓÃ£¬ÏÂÔØÏÂÀ´µÄÎÄ¼þÎÞ·¨Ê¹ÓÃ£¬ÄÄÎ»ÄÜ¸ø½²½²ÔÒòºÍ½â¾ö°ì·¨
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20071010/51019f94/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

hongqn

0楼 2007年10月10日星期三 13:57

Qiangning Hong hongqn在gmail.com
星期三十月 10 13:57:53 HKT 2007

On 10/10/07, bakefish <yellowfool在gmail.com> wrote:
>  localFile = open(url.split('/')[-1], 'w')

这个'w'改成'wb'

-- 
Qiangning Hong
http://www.douban.com/people/hongqn/

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

壳壳

0楼 2007年10月10日星期三 14:00

Jiahua Huang jhuangjiahua在gmail.com
星期三十月 10 14:00:41 HKT 2007

试试这个

import urllib
def download(url):
        file(url.split('/')[-1], 'wb').write(urllib.urlopen(url).read())

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2007年10月10日星期三 14:01

bakefish yellowfool在gmail.com
星期三十月 10 14:01:36 HKT 2007

²»ÐÐµÄ£¬ÎÒÊÔ¹ý£¬²»ÊÇ¶þ½øÖÆµÄÎÊÌâ£¬ËÆºõÔÚwindowsÆ½Ì¨ÏÂ»á×Ô¶¯¼ÓÈë'\r'£¬ÄãÊÔ×ÅÓÃÕâ¸ö´úÂëÏÂtxtÎÄ¼þ¾ÍÄÜ¿´µ½¡£

2007/10/10, Qiangning Hong <hongqn在gmail.com>:
>
> On 10/10/07, bakefish <yellowfool在gmail.com> wrote:
> >  localFile = open(url.split('/')[-1], 'w')
>
> Õâ¸ö'w'¸Ä³É'wb'
>
> --
> Qiangning Hong
> http://www.douban.com/people/hongqn/
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20071010/e93c1e47/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

vicalloy

0楼 2007年10月10日星期三 14:08

vicalloy zbirder在gmail.com
星期三十月 10 14:08:40 HKT 2007

换urllib2看看？
我写的下载图片的方法。
如果不对文件头做判断可能会将404等网页当成文件下回来。
def downloadPic(url, filename):
    request = urllib2.Request(url)
    opener = urllib2.build_opener()
    f = opener.open(request)
    if f.headers.dict['content-type']=='image/jpeg':
        #保存到文件
        xfile = open(filename, 'wb')
        xfile.write(f.read())
        xfile.close()
        return True
    else:
        #TODO 抛出异常
        raise Exception,'not img'

在 07-10-10，bakefish<yellowfool在gmail.com> 写道：
> 不行的，我试过，不是二进制的问题，似乎在windows平台下会自动加入'\r'，你试着用这个代码下txt文件就能看到。
>
>
> 2007/10/10, Qiangning Hong <hongqn在gmail.com>:
> > On 10/10/07, bakefish <yellowfool在gmail.com> wrote:
> > >  localFile = open( url.split('/')[-1], 'w')
> >
> > 这个'w'改成'wb'
> >
> > --
> > Qiangning Hong
> > http://www.douban.com/people/hongqn/
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> > Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>


-- 
Blog http://vicalloy.spaces.live.com/
老照片 http://www.lzpian.com/

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

vicalloy

0楼 2007年10月10日星期三 14:12

vicalloy zbirder在gmail.com
星期三十月 10 14:12:30 HKT 2007

换urllib2看看？
我写的下载图片的方法。
如果不对文件头做判断可能会将404等网页当成文件下回来。
def downloadPic(url, filename):
    request = urllib2.Request(url)
    opener = urllib2.build_opener()
    f = opener.open(request)
    if f.headers.dict['content-type']=='image/jpeg':
        #保存到文件
        xfile = open(filename, 'wb')
        xfile.write(f.read())
        xfile.close()
        return True
    else:
        #TODO 抛出异常
        raise Exception,'not img'

在 07-10-10，bakefish<yellowfool在gmail.com> 写道：
> 不行的，我试过，不是二进制的问题，似乎在windows平台下会自动加入'\r'，你试着用这个代码下txt文件就能看到。
>
>
> 2007/10/10, Qiangning Hong <hongqn在gmail.com>:
> > On 10/10/07, bakefish <yellowfool在gmail.com> wrote:
> > >  localFile = open( url.split('/')[-1], 'w')
> >
> > 这个'w'改成'wb'
> >
> > --
> > Qiangning Hong
> > http://www.douban.com/people/hongqn/
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> > Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to
> python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to
> python-chinese-request在lists.python.cn
> Detail Info:
> http://python.cn/mailman/listinfo/python-chinese
>


-- 
Blog http://vicalloy.spaces.live.com/
老照片 http://www.lzpian.com/

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2007年10月10日星期三 14:19

bakefish yellowfool在gmail.com
星期三十月 10 14:19:11 HKT 2007

Ð»Ð»¸÷Î»£¬ÎÊÌâ½â¾öÁË£¬È·ÊµÊÇ"w"¸ÄÎª"wb"¼´¿É
ÎÊÌâ³öÔÚÐÞ¸Ä´úÂëºó£¬ÎÒÊÇÕâÃ´ÓÃµÄ£¬ÔÚeclipse+pydevµÄinterpreter»·¾³Àï
>>>del module_download
>>>import module_download
>>>module_download.download("http://.........")
ÎÒ·¢ÏÖ£¬ÕâÊ±importÆäÊµÊÇÎÞÐ§£¬ËäÈ»delÁË£¬ÓÃµÄ»¹ÊÇÐÞ¸ÄÇ°µÄ´úÂë


ÔÚ07-10-10£¬bakefish <yellowfool在gmail.com> Ð´µÀ£º
>
> cookbook2ÖÐµÄ*Downloading a File from the Web*
>
> #!/usr/bin/env python
>
> """File downloading from the web.
> """
>
> def download(url):
> 	"""Copy the contents of a file from a given URL
> 	to a local file.
> 	"""
> 	import urllib
> 	webFile = urllib.urlopen(url)
> 	localFile = open(url.split('/')[-1], 'w')
> 	localFile.write(webFile.read())
> 	webFile.close()
> 	localFile.close()
>
> if __name__ == '__main__':
> 	import sys
> 	if len(sys.argv) == 2:
> 		try:
> 			download(sys.argv[1])
> 		except IOError:
> 			print 'Filename not found.'
> 	else:
> 		import os
> 		print 'usage: %s http://server.com/path/to/filename' % os.path.basename(sys.argv[0])
>
> linuxÏÂÊ¹ÓÃÃ»ÓÐÎÊÌâ£¬µ«ÊÇÔÚwindowsÏÂÊ¹ÓÃ£¬ÏÂÔØÏÂÀ´µÄÎÄ¼þÎÞ·¨Ê¹ÓÃ£¬ÄÄÎ»ÄÜ¸ø½²½²ÔÒòºÍ½â¾ö°ì·¨
>
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20071010/0ed2bf04/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

壳壳

0楼 2007年10月10日星期三 14:22

Jiahua Huang jhuangjiahua在gmail.com
星期三十月 10 14:22:52 HKT 2007

加 \r 就是 DOS 文本文件的德性，

重新导入模块用 reload module_download

在 07-10-10，bakefish<yellowfool at gmail.com> 写道：
> 谢谢各位，问题解决了，确实是"w"改为"wb"即可

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2007年10月10日星期三 22:04

Xupeng Yun recordus在gmail.com
星期三十月 10 22:04:38 HKT 2007

在 2007-10-10三的 13:45 +0800，bakefish写道：
> cookbook2中的Downloading a File from the Web 
> #!/usr/bin/env python
> 
> """File downloading from the web.
> """
> 
> def download(url):
> 	"""Copy the contents of a file from a given URL
> 	to a local file.
> 	"""
> 	import urllib
> 	webFile = urllib.urlopen(url)
> 	localFile = open(url.split('/')[-1], 'w')
> 	localFile.write(webFile.read())
> 	webFile.close()
> 	localFile.close()
> 
> if __name__ == '__main__':
> 	import sys
> 	if len(sys.argv) == 2:
> 		try:
> 			download(sys.argv[1])
> 		except IOError:
> 			print 'Filename not found.'
> 	else:
> 		import os
> 		print 'usage: %s http://server.com/path/to/filename' % os.path.basename(sys.argv[0])
> linux下使用没有问题，但是在windows下使用，下载下来的文件无法使用，哪位能给讲讲原因和解决办法

用urllib.urlretrieve多好呢
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20071010/813bb7ff/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！