2006年07月19日 星期三 02:38
我今天用 python 写了一个网页抓取程序,现在发现 https 协议的网页不能够用 urllib.urlopen() 函数获取(报错)。并且我搜索资料也没有找到办法。请指教! 这是我的代码: # 下载某一项资源 def download_resource(src, src_type): url = get_resource_url(src) usock = urllib.urlopen(url) data = usock.read() usock.close() # save the file fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', '\\') fsock = file(fname, 'w') fsock.write(data) fsock.close()
2006年07月19日 星期三 09:12
https握手后.要请求证书..等操作. 我有java的例子..python的没写.这几天没时间..也停工了.哈哈.. http://netkiller.hikz.com/article/security/book.html#id497614 不知什么时候能写完. ----- Original Message ----- From: "Neil" <chenrong2003 at gmail.com> To: <python-chinese at lists.python.cn> Sent: Wednesday, July 19, 2006 2:38 AM Subject: [python-chinese] 请问如何抓取 https 协议的网页内容? > 我今天用 python 写了一个网页抓取程序,现在发现 https 协议的网页不能够用 > urllib.urlopen() > 函数获取(报错)。并且我搜索资料也没有找到办法。请指教! > 这是我的代码: > > # 下载某一项资源 > def download_resource(src, src_type): > url = get_resource_url(src) > usock = urllib.urlopen(url) > data = usock.read() > usock.close() > # save the file > fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', > '\\') > fsock = file(fname, 'w') > fsock.write(data) > fsock.close() > -------------------------------------------------------------------------------- > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese
2006年07月19日 星期三 10:13
curl 在 06-7-19,Neil<chenrong2003 at gmail.com> 写道: > 我今天用 python 写了一个网页抓取程序,现在发现 https 协议的网页不能够用 urllib.urlopen() > 函数获取(报错)。并且我搜索资料也没有找到办法。请指教! > 这是我的代码: > > # 下载某一项资源 > def download_resource(src, src_type): > url = get_resource_url(src) > usock = urllib.urlopen(url) > data = usock.read() > usock.close() > # save the file > fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', '\\') > fsock = file(fname, 'w') > fsock.write(data) > fsock.close() > > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > -- Best regrads, IQDoctor
2006年07月19日 星期三 10:18
httpsÁ¬url¶¼ÊÇÃÜÎĸñʽ£¬µ±È»²»ÄÜÖ±½ÓÓÃÆÕͨµÄurlopen pythonÓ¦¸ÃÓÐÏàÓ¦µÄ¿â£¬µ±È»ssl¿âÊDZØÐëµÄ Ϲ²ÂµÄ£¬ºÇºÇ Best Regards, Zachary Wu (Îâ°~ÀÚ) Software Engineer, Enterprise Content Management FVT, IBM China Software Development Lab Tel: +86 10 82782244-3235. Fax: 82782244-2886 Tie Line: 915-2244-3235 Internet: xiaoleiw at cn.ibm.com Notes ID: Xiao Lei Wu/China/Contr/IBM at IBMCN Address: 8/F, Block A, Power Creative Building, No.1, East Road, Shang Di, Beijing 100085, P.R. China python-chinese-bounces at lists.python.cn дÓÚ 2006-07-19 02:38:50: > ÎÒ½ñÌìÓà python дÁËÒ»¸öÍøҳץȡ³ÌÐò£¬ÏÖÔÚ·¢ÏÖ https ÐÒéµÄÍøÒ³²»Äܹ» > Óà urllib.urlopen() > º¯Êý»ñÈ¡£¨±¨´í£©¡£²¢ÇÒÎÒËÑË÷×ÊÁÏҲûÓÐÕÒµ½°ì·¨¡£ÇëÖ¸½Ì£¡ > ÕâÊÇÎҵĴúÂ룺 > > # ÏÂÔØijһÏî×ÊÔ´ > def download_resource(src, src_type): > url = get_resource_url(src) > usock = urllib.urlopen(url) > data = usock.read() > usock.close() > # save the file > fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', '\\') > fsock = file(fname, 'w') > fsock.write(data) > fsock.close() > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060719/fa998a40/attachment.htm
2006年07月19日 星期三 13:14
在 2006-7-19 2:38:50,Neil <chenrong2003 at gmail.com> 写道: > 我今天用 python 写了一个网页抓取程序,现在发现 https 协议的网页不能够用 urllib.urlopen() > 函数获取(报错)。并且我搜索资料也没有找到办法。请指教! > 这是我的代码: > > # 下载某一项资源 > def download_resource(src, src_type): > url = get_resource_url(src) > usock = urllib.urlopen(url) > data = usock.read() > usock.close() > # save the file > fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', '\\') > fsock = file(fname, 'w') > fsock.write(data) > fsock.close() 参考Urllib2的帮助文档 >>> o = urllib2.build_opener( urllib2.HTTPSHandler()) >>> a = o.open( 'https://gmail.google.com' ) >>> a> >>> a.read() ...... -- 张骏 <zhangj at foreseen-info.com> 敏捷来自Python 简单源于我们 丰元信信息技术有限公司 Python技术交流群:22507237
2006年07月19日 星期三 13:16
httplib òËÆ¿´µ½ÁËHTTPS »¹ÓÐSSL ÔÚ06-7-19£¬Xiao Lei Wu <xiaoleiw at cn.ibm.com> дµÀ£º > > httpsÁ¬url¶¼ÊÇÃÜÎĸñʽ£¬µ±È»²»ÄÜÖ±½ÓÓÃÆÕͨµÄurlopen > pythonÓ¦¸ÃÓÐÏàÓ¦µÄ¿â£¬µ±È»ssl¿âÊDZØÐëµÄ > Ϲ²ÂµÄ£¬ºÇºÇ > > Best Regards, > > Zachary Wu (Îâ°~ÀÚ) > Software Engineer, Enterprise Content Management FVT, IBM China Software > Development Lab > Tel: +86 10 82782244-3235. Fax: 82782244-2886 Tie Line: 915-2244-3235 > Internet: xiaoleiw at cn.ibm.com > Notes ID: Xiao Lei Wu/China/Contr/IBM at IBMCN > Address: 8/F, Block A, Power Creative Building, No.1, East Road, Shang Di, > Beijing 100085, P.R. China > > python-chinese-bounces at lists.python.cn дÓÚ 2006-07-19 02:38:50: > > > > ÎÒ½ñÌìÓà python дÁËÒ»¸öÍøҳץȡ³ÌÐò£¬ÏÖÔÚ·¢ÏÖ https ÐÒéµÄÍøÒ³²»Äܹ» > > Óà urllib.urlopen() > > º¯Êý»ñÈ¡£¨±¨´í£©¡£²¢ÇÒÎÒËÑË÷×ÊÁÏҲûÓÐÕÒµ½°ì·¨¡£ÇëÖ¸½Ì£¡ > > ÕâÊÇÎҵĴúÂ룺 > > > > # ÏÂÔØijһÏî×ÊÔ´ > > def download_resource(src, src_type): > > url = get_resource_url(src) > > usock = urllib.urlopen(url) > > data = usock.read() > > usock.close() > > # save the file > > fname = sys.path[0] + '\\' + get_new_link(src, src_type).replace('/', > '\\') > > fsock = file(fname, 'w') > > fsock.write(data) > > fsock.close() > > _______________________________________________ > > python-chinese > > Post: send python-chinese at lists.python.cn > > Subscribe: send subscribe to python-chinese-request at lists.python.cn > > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > _______________________________________________ > python-chinese > Post: send python-chinese at lists.python.cn > Subscribe: send subscribe to python-chinese-request at lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request at lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > -- devdoer devdoer at gmail.com http://project.mytianwang.cn/cgi-bin/blog -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20060719/b55809fb/attachment.html
2006年07月19日 星期三 14:25
thanks all, 张俊的代码解决了问题。
Zeuux © 2025
京ICP备05028076号