Python论坛的帖子：

Python论坛 - 讨论区

标题：[python-chinese] 使用httplib模块获取中文页面后，把中文打印出来为乱码问题

楼主 2007年01月25日星期四 13:11

GoodGoodStudy&DayDayUp peixu.zhu在gmail.com
星期四一月 25 13:11:16 HKT 2007

如果是用 htmllib
我是用这样的办法取得"书籍"的：
def  anchor_bgn(self, href, name, type):
      self.save_bgn()
      htmllib.HTMLParser.anchor_bgn(self, href, name, type)
def  anchor_end(self):
      mybook[ self.anchorlist[-1] ] = self.save_end()  #  url <---> book
name
      htmllib.HTMLParser.anchor_end(self)



> 另外我还想问下，我有一个超链接地址
>
> http://localhost/mybook">书籍
> 我该如何同时取出这个链接地址和书籍存放在一个字典结构中呢？
> 直接用SGMLParse类的start_a()能实现么？ 我可以得到地址，但"书籍"怎么得到呢？
>
> def start_a(self,attr):
>        url = [value for (key,value) in attrs]
>        del url[len(url)-1]
>        if name:
>            self.urls.append(url)
>
> 在这个方法中处理么？ 还是需要在 handle_data()中处理呢？


-------------------------------------------------------


> THINK big, DO small.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20070125/95245b37/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

caijunjie

0楼 2007年01月26日星期五 12:46

俊杰蔡 yzcaijunjie在gmail.com
星期五一月 26 12:46:11 HKT 2007

Ð»Ð»£¬¸÷Î»¡£
ÎÒÃ÷°×ÁË£¬ÏÖÔÚ»áÌáÈ¡Ïà¹ØÄÚÈÝÁË¡£

ÎÒ°´ÕÕÕâ¸öË¼Â·¿´ÁËÏÂÏÖÔÚ´ÓbaiduÉÏ×Ô¶¯ÏÂÔØÒôÀÖµÄ½Å±¾¡£²»Ã÷°×ÈçºÎ×ª»»ÒôÀÖµØÖ·µÄurl£º ±ÈÈç£¬ÎÒÏÖÔÚ»ñÈ¡ÁËÒ³ÃæÉÏµÄÒ»¶Îurl£º
http://image.stareastnet.com/2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3

Y2Jja2VnaHFiYmZrZm-eaWdnMw$$²¿·Ö²»ÖªµÀÊÇÊ²Ã´¶«Î÷£¬¸ù¾ÝÕâ¶ÎµØÖ·¿Ï¶¨²»ÄÜÏÂÔØ£¬¶øÕýÈ·µØÖ·ÊÇ
http://image.stareastnet.com/2007/01/19/20070119103719g165.mp3

ÄÇÃ´ÕâÀïÓÐ¸öÎÊÌâÁË£¬Y2Jja2VnaHFiYmZrZm-eaWdnMw$$ÈçºÎ×ª»»Îª20070119103719g165ÄØ£¿

On 1/25/07, GoodGoodStudy&DayDayUp; <peixu.zhu在gmail.com> wrote:
>
> Èç¹ûÊÇÓÃ htmllib
> ÎÒÊÇÓÃÕâÑùµÄ°ì·¨È¡µÃ"Êé¼®"µÄ£º
> def  anchor_bgn(self, href, name, type):
>       self.save_bgn()
>       htmllib.HTMLParser.anchor_bgn(self, href, name, type)
> def  anchor_end(self):
>       mybook[ self.anchorlist[-1] ] = self.save_end()  #  url <---> book
> name
>       htmllib.HTMLParser.anchor_end(self)
>
>
>
> > ÁíÍâÎÒ»¹ÏëÎÊÏÂ£¬ÎÒÓÐÒ»¸ö³¬Á´½ÓµØÖ·
> >
> > http://localhost/mybook">Êé¼®
> > ÎÒ¸ÃÈçºÎÍ¬Ê±È¡³öÕâ¸öÁ´½ÓµØÖ·ºÍÊé¼®´æ·ÅÔÚÒ»¸ö×Öµä½á¹¹ÖÐÄØ£¿
> > Ö±½ÓÓÃSGMLParseÀàµÄstart_a()ÄÜÊµÏÖÃ´£¿ ÎÒ¿ÉÒÔµÃµ½µØÖ·£¬µ«"Êé¼®"ÔõÃ´µÃµ½ÄØ£¿
> >
> > def start_a(self,attr):
> >        url = [value for (key,value) in attrs]
> >        del url[len(url)-1]
> >        if name:
> >            self.urls.append(url)
> >
> > ÔÚÕâ¸ö·½·¨ÖÐ´¦ÀíÃ´£¿ »¹ÊÇÐèÒªÔÚ handle_data()ÖÐ´¦ÀíÄØ£¿
>
>
> -------------------------------------------------------
>
>
> > THINK big, DO small.
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070126/f4fc747e/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

大熊非熊

0楼 2007年01月26日星期五 22:49

大熊 bearsprite在gmail.com
星期五一月 26 22:49:09 HKT 2007

以前baidu的搜索结果中就有歌曲的下载地址。现在它把地址改成了你说的那个貌似加过密的玩意，无法直接利用，必须再获取一下下载页面，从中得到下载地址。

或许baidu是为了增加点击？利用baidu来自动下载歌曲的工具还是有不少的。

在07-1-26，俊杰蔡 <yzcaijunjie在gmail.com> 写道：
>
> 谢谢，各位。
> 我明白了，现在会提取相关内容了。
>
> 我按照这个思路看了下现在从baidu上自动下载音乐的脚本。不明白如何转换音乐地址的url： 比如，我现在获取了页面上的一段url：
> http://image.stareastnet.com/2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3
>
> Y2Jja2VnaHFiYmZrZm-eaWdnMw$$部分不知道是什么东西，根据这段地址肯定不能下载，而正确地址是
> http://image.stareastnet.com/2007/01/19/20070119103719g165.mp3
>
> 那么这里有个问题了，Y2Jja2VnaHFiYmZrZm-eaWdnMw$$如何转换为20070119103719g165呢？
>
> On 1/25/07, GoodGoodStudy&DayDayUp; <peixu.zhu在gmail.com > wrote:
> >
> > 如果是用 htmllib
> > 我是用这样的办法取得"书籍"的：
> > def  anchor_bgn(self, href, name, type):
> >       self.save_bgn()
> >       htmllib.HTMLParser.anchor_bgn(self, href, name, type)
> > def  anchor_end(self):
> >       mybook[ self.anchorlist[-1] ] = self.save_end()  #  url <---> book
> > name
> >       htmllib.HTMLParser.anchor_end(self)
> >
> >
> >
> > > 另外我还想问下，我有一个超链接地址
> > >
> > > http://localhost/mybook">书籍
> > > 我该如何同时取出这个链接地址和书籍存放在一个字典结构中呢？
> > > 直接用SGMLParse类的start_a()能实现么？ 我可以得到地址，但"书籍"怎么得到呢？
> > >
> > > def start_a(self,attr):
> > >        url = [value for (key,value) in attrs]
> > >        del url[len(url)-1]
> > >        if name:
> > >            self.urls.append(url)
> > >
> > > 在这个方法中处理么？ 还是需要在 handle_data()中处理呢？
> >
> >
> > -------------------------------------------------------
> >
> >
> > > THINK big, DO small.
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>

-- 
茫茫人海，你是我的最爱
-------------- 下一部分 --------------
一个HTML附件被移除...
URL: http://python.cn/pipermail/python-chinese/attachments/20070126/06ab1053/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

caijunjie

0楼 2007年01月27日星期六 09:57

俊杰蔡 yzcaijunjie在gmail.com
星期六一月 27 09:57:50 HKT 2007

²Î¿¼ÁËÏÂ×ÊÁÏ,ÕâÑùÀ´½â¾ö£º Ò»Ê×mp3µÄ¸èÇúµØÖ·ÊÇÕâÑùµÄ

http://202.108.23.172/m?ct=134217728&tn;=baidusg,ºÚÉ«³á°ò
&word;=mp3,http://image.stareastnet.com/
2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3,,[%BA%DA%C9%AB%B3%E1%B0%F2+%BA%CE%C8%F3%B6%AB]&lm;=16777216"
target=
"_blank" title="Çëµã»÷×ó¼ü£¡À´Ô´ÍøÖ·£º  http://image.stareastnet.com
Çë²ÎÕÕ°Ù¶ÈÈ¨ÀûÉùÃ÷Ê¹ÓÃ" onclick="return ow(this)"
>ºÚÉ«³á°ò  

ÆäÖÐ baidusg,ºÚÉ«³á°ò &word; ÕâÀïÐèÒªÌæ»»Îªbaidusg,recordus &word;£¬ÄÇÃ´ÔÚÊ¹ÓÃhrefÖÐµÄÁ¬½ÓµØÖ·
¾Í¿ÉÒÔÈ¡µÃÏÂÒ»¸ö¸èÇúµÄ×¼È·Á´½ÓµØÖ·¡£



On 1/26/07, ´óÐÜ <bearsprite在gmail.com> wrote:
>
> ÒÔÇ°baiduµÄËÑË÷½á¹ûÖÐ¾ÍÓÐ¸èÇúµÄÏÂÔØµØÖ·¡£ÏÖÔÚËü°ÑµØÖ·¸Ä³ÉÁËÄãËµµÄÄÇ¸öÃ²ËÆ¼Ó¹ýÃÜµÄÍæÒâ£¬ÎÞ·¨Ö±½ÓÀûÓÃ£¬±ØÐëÔÙ»ñÈ¡Ò»ÏÂÏÂÔØÒ³Ãæ£¬´ÓÖÐµÃµ½ÏÂÔØµØÖ·¡£
>
> »òÐíbaiduÊÇÎªÁËÔö¼Óµã»÷£¿ÀûÓÃbaiduÀ´×Ô¶¯ÏÂÔØ¸èÇúµÄ¹¤¾ß»¹ÊÇÓÐ²»ÉÙµÄ¡£
>
> ÔÚ07-1-26£¬¿¡½Ü²Ì < yzcaijunjie在gmail.com> Ð´µÀ£º
> >
> > Ð»Ð»£¬¸÷Î»¡£
> > ÎÒÃ÷°×ÁË£¬ÏÖÔÚ»áÌáÈ¡Ïà¹ØÄÚÈÝÁË¡£
> >
> > ÎÒ°´ÕÕÕâ¸öË¼Â·¿´ÁËÏÂÏÖÔÚ´ÓbaiduÉÏ×Ô¶¯ÏÂÔØÒôÀÖµÄ½Å±¾¡£²»Ã÷°×ÈçºÎ×ª»»ÒôÀÖµØÖ·µÄurl£º ±ÈÈç£¬ÎÒÏÖÔÚ»ñÈ¡ÁËÒ³ÃæÉÏµÄÒ»¶Îurl£º
> > http://image.stareastnet.com/2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3
> >
> >
> > Y2Jja2VnaHFiYmZrZm-eaWdnMw$$²¿·Ö²»ÖªµÀÊÇÊ²Ã´¶«Î÷£¬¸ù¾ÝÕâ¶ÎµØÖ·¿Ï¶¨²»ÄÜÏÂÔØ£¬¶øÕýÈ·µØÖ·ÊÇ
> > http://image.stareastnet.com/2007/01/19/20070119103719g165.mp3
> >
> > ÄÇÃ´ÕâÀïÓÐ¸öÎÊÌâÁË£¬Y2Jja2VnaHFiYmZrZm-eaWdnMw$$ÈçºÎ×ª»»Îª20070119103719g165ÄØ£¿
> >
> > On 1/25/07, GoodGoodStudy&DayDayUp; < peixu.zhu在gmail.com > wrote:
> > >
> > > Èç¹ûÊÇÓÃ htmllib
> > > ÎÒÊÇÓÃÕâÑùµÄ°ì·¨È¡µÃ"Êé¼®"µÄ£º
> > > def  anchor_bgn(self, href, name, type):
> > >       self.save_bgn()
> > >       htmllib.HTMLParser.anchor_bgn(self, href, name, type)
> > > def  anchor_end(self):
> > >       mybook[ self.anchorlist[-1] ] = self.save_end()  #  url <--->
> > > book name
> > >       htmllib.HTMLParser.anchor_end(self)
> > >
> > >
> > >
> > > > ÁíÍâÎÒ»¹ÏëÎÊÏÂ£¬ÎÒÓÐÒ»¸ö³¬Á´½ÓµØÖ·
> > > >
> > > > http://localhost/mybook">Êé¼®
> > > > ÎÒ¸ÃÈçºÎÍ¬Ê±È¡³öÕâ¸öÁ´½ÓµØÖ·ºÍÊé¼®´æ·ÅÔÚÒ»¸ö×Öµä½á¹¹ÖÐÄØ£¿
> > > > Ö±½ÓÓÃSGMLParseÀàµÄstart_a()ÄÜÊµÏÖÃ´£¿ ÎÒ¿ÉÒÔµÃµ½µØÖ·£¬µ«"Êé¼®"ÔõÃ´µÃµ½ÄØ£¿
> > > >
> > > > def start_a(self,attr):
> > > >        url = [value for (key,value) in attrs]
> > > >        del url[len(url)-1]
> > > >        if name:
> > > >            self.urls.append(url)
> > > >
> > > > ÔÚÕâ¸ö·½·¨ÖÐ´¦ÀíÃ´£¿ »¹ÊÇÐèÒªÔÚ handle_data()ÖÐ´¦ÀíÄØ£¿
> > >
> > >
> > > -------------------------------------------------------
> > >
> > >
> > > > THINK big, DO small.
> > >
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese在lists.python.cn
> > > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > > python-chinese-request在lists.python.cn
> > > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> > >
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
> --
> Ã£Ã£ÈËº££¬ÄãÊÇÎÒµÄ×î°®
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070127/e6e3f7eb/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

大熊非熊

0楼 2007年01月27日星期六 10:48

大熊 bearsprite在gmail.com
星期六一月 27 10:48:35 HKT 2007

http://202.108.23.172/m?ct=134217728&tn;=baidusg,recordus &word;=mp3,……

其实这个recordus可是随便什么字，它是作为参数传给下载地址页面的，关键是下载页面中的下载地址。

在07-1-27，俊杰蔡 <yzcaijunjie在gmail.com> 写道：
>
> 参考了下资料,这样来解决： 一首mp3的歌曲地址是这样的
>
> http://202.108.23.172/m?ct=134217728&tn;=baidusg,
> 黑色翅膀  &word;=mp3,http://image.stareastnet.com/
> 2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3,,[%BA%DA%C9%AB%B3%E1%B0%F2+%BA%CE%C8%F3%B6%AB]&lm;=16777216" target=
> "_blank" title="请点击左键！来源网址：  http://image.stareastnet.com   请参照百度权利声明使用" onclick=
> "return ow(this)"
> >黑色翅膀  
>
> 其中 baidusg,黑色翅膀 &word; 这里需要替换为baidusg,recordus &word;，那么在使用href中的连接地址
>
> 就可以取得下一个歌曲的准确链接地址。
>
> --
茫茫人海，你是我的最爱
-------------- 下一部分 --------------
一个HTML附件被移除...
URL: http://python.cn/pipermail/python-chinese/attachments/20070127/81f58a0e/attachment.htm

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

caijunjie

0楼 2007年01月27日星期六 14:55

俊杰蔡 yzcaijunjie在gmail.com
星期六一月 27 14:55:33 HKT 2007

¹ûÈ»Èç´Ë£¬È«À´Õâ¸öÈ¡ÃûÊÇ´«¸øÏÂÒ»¸öÒ³Ãæ×÷Îª¸èÇúÃû×ÖµÄ

Ð»Ð»£¬Ð»Ð»

On 1/27/07, ´óÐÜ <bearsprite在gmail.com> wrote:
>
> http://202.108.23.172/m?ct=134217728&tn;=baidusg,recordus &word;=mp3,¡¡
>
> ÆäÊµÕâ¸örecordus¿ÉÊÇËæ±ãÊ²Ã´×Ö£¬ËüÊÇ×÷Îª²ÎÊý´«¸øÏÂÔØµØÖ·Ò³ÃæµÄ£¬¹Ø¼üÊÇÏÂÔØÒ³ÃæÖÐµÄÏÂÔØµØÖ·¡£
>
> ÔÚ07-1-27£¬¿¡½Ü²Ì < yzcaijunjie在gmail.com> Ð´µÀ£º
> >
> > ²Î¿¼ÁËÏÂ×ÊÁÏ,ÕâÑùÀ´½â¾ö£º Ò»Ê×mp3µÄ¸èÇúµØÖ·ÊÇÕâÑùµÄ
> >
> > > >  href="http://202.108.23.172/m?ct=134217728&tn;=baidusg,
> >
> > ºÚÉ«³á°ò  &word;=mp3,http://image.stareastnet.com/
> > 2007/01/19/Y2Jja2VnaHFiYmZrZm-eaWdnMw$$.mp3,,[%BA%DA%C9%AB%B3%E1%B0%F2+%BA%CE%C8%F3%B6%AB]&lm;=16777216"
> > target=
> > "_blank" title="Çëµã»÷×ó¼ü£¡À´Ô´ÍøÖ·£º  http://image.stareastnet.com
> >    Çë²ÎÕÕ°Ù¶ÈÈ¨ÀûÉùÃ÷Ê¹ÓÃ" onclick=
> > "return ow(this)"
> > >ºÚÉ«³á°ò  
> >
> > ÆäÖÐ baidusg,ºÚÉ«³á°ò &word; ÕâÀïÐèÒªÌæ»»Îªbaidusg,recordus &word;£¬ÄÇÃ´ÔÚÊ¹ÓÃhrefÖÐµÄÁ¬½ÓµØÖ·
> >
> >
> > ¾Í¿ÉÒÔÈ¡µÃÏÂÒ»¸ö¸èÇúµÄ×¼È·Á´½ÓµØÖ·¡£
> >
> > --
> Ã£Ã£ÈËº££¬ÄãÊÇÎÒµÄ×î°®
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒÆ³ý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070127/8b95caeb/attachment.html

[导入自Mailman归档：http://www.zeuux.org/pipermail/zeuux-python]

请登录后回复。还没有在Zeuux哲思注册吗？现在注册！