Python论坛  - 讨论区

标题:[python-chinese] python中文分词模块

2007年05月30日 星期三 01:37

junyi sun ccnusjy在gmail.com
星期三 五月 30 01:37:12 HKT 2007

Sorry,ÔÚÔ´Âë×¢ÊÍÖаÑÈßÓ෽ʽºÍ±£ÊØ·½Ê½¸ã·´ÁË£¬²»¹ýÔËÐÐÀý×Óºó¾Í¿ÉÒÔÖªµÀÁ½ÖÖģʽµÄÇø±ðÁË¡£

On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote:
>
> Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£
>
> 1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨
> 2.×ÖµäÓÃbsddbµÄbtopenģʽ
> 3.´Ê¿â¹æÄ£15Íò´Ê
> 4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½
>
> ʹÓ÷½·¨£º
> d=CDict()
> s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
> words=d.segWords(s)
> for w in words:
>         print w.decode('utf-8')
>
> PS:
> ´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£
>
>
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/a9d81650/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 15:08

风向标 vaneoooo在gmail.com
星期三 五月 30 15:08:06 HKT 2007

Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/59950e1a/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 15:29

junyi sun ccnusjy在gmail.com
星期三 五月 30 15:29:49 HKT 2007

 Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£

1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨
2.×ÖµäÓÃbsddbµÄbtopenģʽ
3.´Ê¿â¹æÄ£15Íò´Ê
4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½

ʹÓ÷½·¨£º
d=CDict()
s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
words=d.segWords(s)
for w in words:
        print w.decode('utf-8')

PS:
´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£


     <http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=att&th;=112d8e60a108c5fa>
*pythonÖÐÎÄ·Ö´Ê.rar*
2324K Download<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=att&th;=112d8e60a108c5fa>


On 5/30/07, ·çÏò±ê <vaneoooo在gmail.com> wrote:
>
> Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/41269baf/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 18:29

junyi sun ccnusjy在gmail.com
星期三 五月 30 18:29:04 HKT 2007

ÎÒÓÖ·¢ÁËÒ»±é£¬ÏÖÔÚ¿ÉÒÔ¿´¼ûÁËÂð£¿

On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote:
>
>  Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£
>
> 1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨
> 2.×ÖµäÓÃbsddbµÄbtopenģʽ
> 3.´Ê¿â¹æÄ£15Íò´Ê
> 4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½
>
> ʹÓ÷½·¨£º
> d=CDict()
> s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
> words=d.segWords(s)
> for w in words:
>         print w.decode('utf-8')
>
> PS:
> ´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£
>
>
>
> <http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=att&th;=112d8e60a108c5fa>
> *pythonÖÐÎÄ·Ö´Ê.rar*
> 2324K Download<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=att&th;=112d8e60a108c5fa>
>
>
> On 5/30/07, ·çÏò±ê <vaneoooo在gmail.com> wrote:
> >
> > Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/bb31d566/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 18:31

jessinio smith jessinio在gmail.com
星期三 五月 30 18:31:49 HKT 2007

¿´µ½
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/c1ad4b93/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 18:43

eking eking_he在mezimedia.com
星期三 五月 30 18:43:16 HKT 2007





 
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
att&th;=112d8e60a108c5fa> 

 

pythonÖÐÎÄ·Ö´Ê.rar
2324K Download
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
att&th;=112d8e60a108c5fa>  

Õâ¸öÊÇÄã×Ô¼ºgmailµÄÁ´½Ó°É£¬±ðÈËÔõôÏÂÔصÃÁË

 

  _____  

From: python-chinese-bounces在lists.python.cn
[mailto:python-chinese-bounces在lists.python.cn] On Behalf Of junyi sun
Sent: 2007Äê5ÔÂ30ÈÕ 18:29
To: python-chinese在lists.python.cn
Subject: Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é

 

ÎÒÓÖ·¢ÁËÒ»±é£¬ÏÖÔÚ¿ÉÒÔ¿´¼ûÁËÂð£¿

On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote: 

Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£

 

1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨

2.×ÖµäÓÃbsddbµÄbtopenģʽ

3.´Ê¿â¹æÄ£15Íò´Ê

4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½

 

ʹÓ÷½·¨£º

d=CDict()

s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
words=d.segWords(s)
for w in words:
        print w.decode('utf-8')

 

PS:

´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£

 






 
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
att&th;=112d8e60a108c5fa> 

 

pythonÖÐÎÄ·Ö´Ê.rar
2324K Download
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
att&th;=112d8e60a108c5fa>  

 

On 5/30/07, ·çÏò±ê <vaneoooo在gmail.com> wrote: 

Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË 
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn 
Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

 

 

-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/9eeef8b0/attachment-0001.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 20:34

风向标 vaneoooo在gmail.com
星期三 五月 30 20:34:43 HKT 2007

¹ûÈ»ÏÂÔز»ÁË¡£
һʱ¼äÓ¿ÏֺöàÖÐÎÄ·Ö´Ê ¹þ¹þ
Ç°¶Îʱ¼ä»¹ÔÚ³îÕâ¸öÊ£¬½ñÌì¾ÍÓÐÁ©Î»·ÖÏíµÄ
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/22e53106/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 20:45

jessinio smith jessinio在gmail.com
星期三 五月 30 20:45:11 HKT 2007

ÏÂÔز»ÁËѽ

Ô۰죬ÀÏ´ó~£¿£¿~£¡£¿~£¡£¿~£¡£¿~£¡£¿
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070530/5a13741a/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月30日 星期三 21:44

cun heise cunheise在hotmail.com
星期三 五月 30 21:44:39 HKT 2007

发一份给我吧谢谢了
cunheise在hotmail.com


>From: "eking" <eking_he在mezimedia.com>
>Reply-To: python-chinese在lists.python.cn
>To: <python-chinese在lists.python.cn>
>Subject: Re: [python-chinese] python中文分词模块
>Date: Wed, 30 May 2007 18:43:16 +0800
>
>
>
>
>
>
>
><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=

>att&th;=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=

>att&th;=112d8e60a108c5fa>
>
>这个是你自己gmail的链接吧,别人怎么下载得了
>
>
>
>   _____
>
>From: python-chinese-bounces在lists.python.cn
>[mailto:python-chinese-bounces在lists.python.cn] On Behalf Of junyi sun
>Sent: 2007年5月30日 18:29
>To: python-chinese在lists.python.cn
>Subject: Re: [python-chinese] python中文分词模块
>
>
>
>我又发了一遍,现在可以看见了吗?
>
>On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote:
>
>这个模块是我的PySozone(python开发的搜索引擎)项目中的一部分,拿出来开源。
>
>
>
>1.算法采用反向最大匹配算法
>
>2.字典用bsddb的btopen模式
>
>3.词库规模15万词
>
>4.分词方式有冗余方式和保守方式
>
>
>
>使用方法:
>
>d=CDict()
>
>s="我爱北京天安门".decode('gbk').encode('utf-8'))
>words=d.segWords(s)
>for w in words:
>         print w.decode('utf-8')
>
>
>
>PS:
>
>词库是基于bsddb的btopen模式的,大家可以根据需要添加自定义的新词。
>
>
>
>
>
>
>
>
>
><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=

>att&th;=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=

>att&th;=112d8e60a108c5fa>
>
>
>
>On 5/30/07, 风向标 <vaneoooo在gmail.com> wrote:
>
>这个我怎么没收到主邮件?光有个Re的了
>_______________________________________________
>python-chinese
>Post: send python-chinese在lists.python.cn
>Subscribe: send subscribe to python-chinese-request在lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>
>
>


>_______________________________________________
>python-chinese
>Post: send python-chinese在lists.python.cn
>Subscribe: send subscribe to python-chinese-request在lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese

_________________________________________________________________
与联机的朋友进行交流,请使用  Live Messenger; 
http://get.live.com/messenger/overview 


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月31日 星期四 21:55

junyi sun ccnusjy在gmail.com
星期四 五月 31 21:55:48 HKT 2007

ÎÒÒ²²»ÖªµÀΪʲômaillistÀïÃæµÄ¸½¼þÏÂÔز»ÁË£¬ËùÒÔÎÒÉÏ´«µ½csdnÁË¡£
ÏÂÔصØÖ·£º
http://download.csdn.net/source/187315



On 5/30/07, cun heise <cunheise在hotmail.com> wrote:
>
> ·¢Ò»·Ý¸øÎÒ°ÉллÁË
> cunheise在hotmail.com
>
>
> >From: "eking" <eking_he在mezimedia.com>
> >Reply-To: python-chinese在lists.python.cn
> >To: <python-chinese在lists.python.cn>
> >Subject: Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é
> >Date: Wed, 30 May 2007 18:43:16 +0800
> >
> >
> >
> >
> >
> >
> >
> ><
> http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >pythonÖÐÎÄ·Ö´Ê.rar
> >2324K Download
> ><
> http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >Õâ¸öÊÇÄã×Ô¼ºgmailµÄÁ´½Ó°É£¬±ðÈËÔõôÏÂÔصÃÁË
> >
> >
> >
> >   _____
> >
> >From: python-chinese-bounces在lists.python.cn
> >[mailto:python-chinese-bounces在lists.python.cn] On Behalf Of junyi sun
> >Sent: 2007Äê5ÔÂ30ÈÕ 18:29
> >To: python-chinese在lists.python.cn
> >Subject: Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é
> >
> >
> >
> >ÎÒÓÖ·¢ÁËÒ»±é£¬ÏÖÔÚ¿ÉÒÔ¿´¼ûÁËÂð£¿
> >
> >On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote:
> >
> >Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£
> >
> >
> >
> >1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨
> >
> >2.×ÖµäÓÃbsddbµÄbtopenģʽ
> >
> >3.´Ê¿â¹æÄ£15Íò´Ê
> >
> >4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½
> >
> >
> >
> >ʹÓ÷½·¨£º
> >
> >d=CDict()
> >
> >s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
> >words=d.segWords(s)
> >for w in words:
> >         print w.decode('utf-8')
> >
> >
> >
> >PS:
> >
> >´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£
> >
> >
> >
> >
> >
> >
> >
> >
> >
> ><
> http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >pythonÖÐÎÄ·Ö´Ê.rar
> >2324K Download
> ><
> http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >On 5/30/07, ·çÏò±ê <vaneoooo在gmail.com> wrote:
> >
> >Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË
> >_______________________________________________
> >python-chinese
> >Post: send python-chinese在lists.python.cn
> >Subscribe: send subscribe to python-chinese-request在lists.python.cn
> >Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> >Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
> >
> >
> >
>
>
> >_______________________________________________
> >python-chinese
> >Post: send python-chinese在lists.python.cn
> >Subscribe: send subscribe to python-chinese-request在lists.python.cn
> >Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> >Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
> _________________________________________________________________
> ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà  Live Messenger;
> http://get.live.com/messenger/overview
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070531/a0bfcac6/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月31日 星期四 22:08

jessinio smith jessinio在gmail.com
星期四 五月 31 22:08:59 HKT 2007

Õâ·âÐÅÒª±£´æÏÂÀ´²ÅÐС£¹þ¹þ¹þ
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070531/7dfb8d1f/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年05月31日 星期四 22:39

风向标 vaneoooo在gmail.com
星期四 五月 31 22:39:59 HKT 2007

Ì«±ÉÊÓcsdnÁË  ±ØÐëÒªµÇ¼
Â¥Ö÷»¹²»Èç·Åµ½×ÄľÄñ
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070531/a5f7b9a1/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号