Python论坛的帖子： - 哲思

Python论坛 - 讨论区

返回群组主页

标题：[python-chinese] 一个URL编码的问题

分享

员旭鹏

楼主 2006年10月27日星期五 23:44

Xupeng Yun recordus在gmail.com
星期五十月 27 23:44:12 HKT 2006

刚才搜索了以前的邮件，还是没能够解决这个小问题：
我希望能够在程序中把这样的URL：
http://www.test.org/中文/测试.html
转换成这样：
http://www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html
使用urllib.quote进行转换时结果是这样的：
http%3A//www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html
怎么会把 / 也给转换了呢？同样URL中的 & ? 等也会被转换。

对这个问题不是很理解，还在继续实验中。
-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061027/7a998d3d/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2006年10月27日星期五 23:49

Xupeng Yun recordus在gmail.com
星期五十月 27 23:49:05 HKT 2006

笔误，是urllib.quote把 : 给转换了

2006/10/27, Xupeng Yun <recordus at gmail.com>:
>
> 刚才搜索了以前的邮件，还是没能够解决这个小问题：
> 我希望能够在程序中把这样的URL：
> http://www.test.org/中文/测试.html
> 转换成这样：
> http://www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html
> 使用urllib.quote进行转换时结果是这样的：
> http%3A//www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html
> 怎么会把 / 也给转换了呢？同样URL中的 & ? 等也会被转换。
>
> 对这个问题不是很理解，还在继续实验中。
>

-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061027/f65d2819/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2006年10月28日星期六 00:11

Xupeng Yun recordus在gmail.com
星期六十月 28 00:11:02 HKT 2006

最后，我选择了把中文提取出来，逐个quote以后再替换：
url = 'http://www.test.org/中文/测试.html'
pattern = u'[\u4e00-\u9fa5]+'.encode('utf8')
ret = re.findall(p, url)
for word in ret:
    url = url.replace(word, quote(word.decode('utf8').encode('gbk')))

不知道有没有好一些的方法
-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061028/0902a756/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

齐辰雄

0楼 2006年10月28日星期六 10:39

麦田守望者 qcxhome在gmail.com
星期六十月 28 10:39:37 HKT 2006

自己写一个函数就行了。回头再琢磨库的使用方法。

-- 
GoogleTalk: qcxhome at gmail.com
MSN: qcxhome at hotmail.com
My Space: tkdchen.spaces.live.com
BOINC: boinc.berkeley.edu
中国分布式计算总站: www.equn.com

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

林旺茂

0楼 2006年10月28日星期六 11:28

3751 lwm3751在gmail.com
星期六十月 28 11:28:11 HKT 2006

俺写的一段代码可以完成你要的功能
#coding=gbk
import urllib
import urlparse
url = 'http://www.test.org/中文/测试.html'
scheme, netloc, url, query, fragment = urlparse.urlsplit(url)
url = urllib.quote(url)
url = urlparse.urlunsplit((scheme, netloc, url, query, fragment))
print url

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2006年10月28日星期六 12:07

Xupeng Yun recordus在gmail.com
星期六十月 28 12:07:42 HKT 2006

2006/10/28, 3751 <lwm3751 at gmail.com>:
>
> 俺写的一段代码可以完成你要的功能
> #coding=gbk
> import urllib
> import urlparse
> url = 'http://www.test.org/中文/测试.html'
> scheme, netloc, url, query, fragment = urlparse.urlsplit(url)
> url = urllib.quote(url)
> url = urlparse.urlunsplit((scheme, netloc, url, query, fragment))
> print url
>

嗯，这个方法比我的好。


-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061028/306fba94/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

林旺茂

0楼 2006年10月28日星期六 12:25

3751 lwm3751在gmail.com
星期六十月 28 12:25:25 HKT 2006

想了一下还应该加上以下两句才对
query = urllib.quote(query)
fragment = urllib.quote(fragment)

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2006年10月28日星期六 14:20

Xupeng Yun recordus在gmail.com
星期六十月 28 14:20:42 HKT 2006

又仔细看了一下urllib.quote的doc string，终于弄明白这个函数怎么用了，可能昨天晚上太晚脑子迷糊了愣是没看懂，呵呵。是这样：

quote的原型是：quote(s, safe='/')，
因此，调用时把不需要转换的字符的列表作为safe参数传入即可：

>>> import string
>>> url = 'http://www.test.org/中文/测试.html'
>>> print urllib.quote(url, string.punctuation)
http://www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html

不使用safe参数默认只有 / 被保留不quote，就出现了昨天让我困惑的结果，呵呵。

-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061028/966f7491/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

徐继哲

0楼 2006年10月30日星期一 10:50

Gavin gavin在sz.net.cn
星期一十月 30 10:50:10 HKT 2006

ＵＲＬ编码最好的办法还是将不要将整个ＵＲＬ进行编码，应将各部分先分解，然后编码后再合
并。

  ----- Original Message ----- 
  发件人: Xupeng Yun
  收件人: python-chinese在lists.python.cn
  发送时间: 2006年10月28日 14:20
  主题: Re: [python-chinese] 一个URL编码的问题


  又仔细看了一下urllib.quote的doc string，终于弄明白这个函数怎么用了，可能昨天晚上太晚脑子迷糊了愣是没看懂，呵呵。是这样：

  quote的原型是：quote(s, safe='/')，
  因此，调用时把不需要转换的字符的列表作为safe参数传入即可：

  >>> import string
  >>> url = 'http://www.test.org/中文/测试.html'
  >>> print urllib.quote(url, string.punctuation)
  http://www.test.org/%E4%B8%AD%E6%96%87/%E6%B5%8B%E8%AF%95.html

  不使用safe参数默认只有 / 被保留不quote，就出现了昨天让我困惑的结果，呵呵。

-------------- 下一部分 --------------
一个HTML附件被移除...
URL: http://python.cn/pipermail/python-chinese/attachments/20061030/f440060f/attachment-0001.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

员旭鹏

0楼 2006年10月30日星期一 13:08

Xupeng Yun recordus在gmail.com
星期一十月 30 13:08:04 HKT 2006

2006/10/30, Gavin <gavin at sz.net.cn>:
>
>
> ＵＲＬ编码最好的办法还是将不要将整个ＵＲＬ进行编码，应将各部分先分解，然后编码后再合
> 并。
>

在我弄明白quote函数的用法之前我就是把URL分解、编码后再合并的，不过后来发现只用quote就可以搞定了：）
-- 
I like Python & Linux.
Blog: http://recordus.cublog.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20061030/2202ead8/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！

Zeuux © 2025

京ICP备05028076号