Python论坛  - 讨论区

标题:[python-chinese] 答复: python中文分词模块

2007年06月01日 星期五 15:29

郝钰 haoyu在csdn.net
星期五 六月 1 15:29:28 HKT 2007

分词的结果不错,请问可以自己增加词条吗

 

发件人: python-chinese-bounces at lists.python.cn
[mailto:python-chinese-bounces at lists.python.cn] 代表 junyi sun
发送时间: 2007年5月31日 21:56
收件人: python-chinese at lists.python.cn
主题: Re: [python-chinese] python中文分词模块

 

我也不知道为什么maillist里面的附件下载不了,所以我上传到csdn了。

下载地址:

http://download.csdn.net/source/187315



 

On 5/30/07, cun heise <cunheise at hotmail.com> wrote: 

发一份给我吧谢谢了
cunheise at hotmail.com


>From: "eking" < eking_he at mezimedia.com>
>Reply-To: python-chinese at lists.python.cn
>To: < python-chinese at lists.python.cn>  python-chinese at lists.python.
cn>
>Subject: Re: [python-chinese] python中文分词模块
>Date: Wed, 30 May 2007 18:43:16 +0800
>
>
>
>
>
>
>
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=

>att&th;=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=

>att&th;=112d8e60a108c5fa>
>
>这个是你自己gmail的链接吧,别人怎么下载得了
>
>
>
>   _____
> 
>From: python-chinese-bounces at lists.python.cn
>[mailto:python-chinese-bounces at lists.python.cn ] On Behalf Of junyi sun
>Sent: 2007年5月30日 18:29
>To: python-chinese at lists.python.cn
>Subject: Re: [python-chinese] python中文分词模块
>
>
>
>我又发了一遍,现在可以看见了吗?
>
>On 5/30/07, junyi sun <ccnusjy at gmail.com> wrote:
>
>这个模块是我的PySozone(python开发的搜索引擎)项目中的一部分,拿出来开源。
>
>
> 
>1.算法采用反向最大匹配算法
>
>2.字典用bsddb的btopen模式
>
>3.词库规模15万词
>
>4.分词方式有冗余方式和保守方式
>
>
>
>使用方法:
>
>d=CDict()
>
>s="我爱北京天安门".decode('gbk').encode('utf-8')) 
>words=d.segWords(s)
>for w in words:
>         print w.decode('utf-8')
>
>
>
>PS:
>
>词库是基于bsddb的btopen模式的,大家可以根据需要添加自定义的新词。
>
>
>
>
>
>
>
>
>
><http://mail.google.com/mail/?realattid=f_f2anayul
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
> &attid;=0.1&disp;=attd&view;= 

>att&th;=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=

>att&th;=112d8e60a108c5fa>
>
>
>
>On 5/30/07, 风向标 < vaneoooo at gmail.com vaneoooo at gmail.com> >
wrote:
>
>这个我怎么没收到主邮件?光有个Re的了
>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn 
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to
python-chinese-request at lists.python.cn>
python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>
>
>


>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese

_________________________________________________________________
与联机的朋友进行交流,请使用  Live Messenger;
http://get.live.com/messenger/overview 

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20070601/fa22f28f/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年06月01日 星期五 16:20

junyi sun ccnusjy在gmail.com
星期五 六月 1 16:20:02 HKT 2007

¿ÉÒÔ£¬Èç¹ûÄãÓõÄÊÇpython2.5µÄ»°£¬×Ô´øbsddbÄ£¿é
Ê×ÏÈÕÒµ½dataĿ¼ÏÂÃæµÄdict.datÎļþ£¬Õâ¸ö¾ÍÊÇ´Ê¿â
ʾÀý
import bsdddb
d=bsddb.btopen("dict.dat")
newword='Ñî¶þ³µÀ­Ä·'.decode('gbk').encode('utf-8')
d[newword]=8 #Õâ¸öÖµ±íʾ´ÊƵ£¬Äã¿ÉÒÔ¸ù¾ÝÇé¿öÉèÖÃ
d.close()




On 6/1/07, ºÂîÚ <haoyu在csdn.net> wrote:
>
>  ·Ö´ÊµÄ½á¹û²»´í,ÇëÎÊ¿ÉÒÔ×Ô¼ºÔö¼Ó´ÊÌõÂð
>
>
>
> *·¢¼þÈË:* python-chinese-bounces在lists.python.cn [mailto:
> python-chinese-bounces在lists.python.cn] *´ú±í *junyi sun
> *·¢ËÍʱ¼ä:* 2007Äê5ÔÂ31ÈÕ 21:56
> *ÊÕ¼þÈË:* python-chinese在lists.python.cn
> *Ö÷Ìâ:* Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é
>
>
>
> ÎÒÒ²²»ÖªµÀΪʲômaillistÀïÃæµÄ¸½¼þÏÂÔز»ÁË£¬ËùÒÔÎÒÉÏ´«µ½csdnÁË¡£
>
> ÏÂÔصØÖ·£º
>
> http://download.csdn.net/source/187315
>
>
>
>
>
> On 5/30/07, *cun heise* <cunheise在hotmail.com> wrote:
>
> ·¢Ò»·Ý¸øÎÒ°ÉллÁË
> cunheise在hotmail.com
>
>
> >From: "eking" < eking_he在mezimedia.com>
> >Reply-To: python-chinese在lists.python.cn
> >To: < python-chinese在lists.python.cn>
> >Subject: Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é
> >Date: Wed, 30 May 2007 18:43:16 +0800
> >
> >
> >
> >
> >
> >
> >
> ><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >pythonÖÐÎÄ·Ö´Ê.rar
> >2324K Download
> ><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >Õâ¸öÊÇÄã×Ô¼ºgmailµÄÁ´½Ó°É£¬±ðÈËÔõôÏÂÔصÃÁË
> >
> >
> >
> >   _____
> >
> >From: python-chinese-bounces在lists.python.cn
> >[mailto:python-chinese-bounces在lists.python.cn ] On Behalf Of junyi sun
> >Sent: 2007Äê5ÔÂ30ÈÕ 18:29
> >To: python-chinese在lists.python.cn
> >Subject: Re: [python-chinese] pythonÖÐÎÄ·Ö´ÊÄ£¿é
> >
> >
> >
> >ÎÒÓÖ·¢ÁËÒ»±é£¬ÏÖÔÚ¿ÉÒÔ¿´¼ûÁËÂð£¿
> >
> >On 5/30/07, junyi sun <ccnusjy在gmail.com> wrote:
> >
> >Õâ¸öÄ£¿éÊÇÎÒµÄPySozone(python¿ª·¢µÄËÑË÷ÒýÇæ)ÏîÄ¿ÖеÄÒ»²¿·Ö£¬ÄóöÀ´¿ªÔ´¡£
> >
> >
> >
> >1.Ëã·¨²ÉÓ÷´Ïò×î´óÆ¥ÅäËã·¨
> >
> >2.×ÖµäÓÃbsddbµÄbtopenģʽ
> >
> >3.´Ê¿â¹æÄ£15Íò´Ê
> >
> >4.·Ö´Ê·½Ê½ÓÐÈßÓ෽ʽºÍ±£ÊØ·½Ê½
> >
> >
> >
> >ʹÓ÷½·¨£º
> >
> >d=CDict()
> >
> >s="ÎÒ°®±±¾©Ìì°²ÃÅ".decode('gbk').encode('utf-8'))
> >words=d.segWords(s)
> >for w in words:
> >         print w.decode('utf-8')
> >
> >
> >
> >PS:
> >
> >´Ê¿âÊÇ»ùÓÚbsddbµÄbtopenģʽµÄ£¬´ó¼Ò¿ÉÒÔ¸ù¾ÝÐèÒªÌí¼Ó×Ô¶¨ÒåµÄдʡ£
> >
> >
> >
> >
> >
> >
> >
> >
> >
> ><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=attd&view;=
>
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >pythonÖÐÎÄ·Ö´Ê.rar
> >2324K Download
> ><http://mail.google.com/mail/?realattid=f_f2anayul&attid;=0.1&disp;=safe&view;=
>
> >att&th;=112d8e60a108c5fa>
> >
> >
> >
> >On 5/30/07, ·çÏò±ê < vaneoooo在gmail.com> wrote:
> >
> >Õâ¸öÎÒÔõôûÊÕµ½Ö÷Óʼþ£¿¹âÓиöReµÄÁË
> >_______________________________________________
> >python-chinese
> >Post: send python-chinese在lists.python.cn
> >Subscribe: send subscribe to python-chinese-request在lists.python.cn
> >Unsubscribe: send unsubscribe to   python-chinese-request在lists.python.cn
> >Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
> >
> >
> >
> >
>
>
> >_______________________________________________
> >python-chinese
> >Post: send python-chinese在lists.python.cn
> >Subscribe: send subscribe to python-chinese-request在lists.python.cn
> >Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> >Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
> _________________________________________________________________
> ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà  Live Messenger;
> http://get.live.com/messenger/overview
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070601/f2647063/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年06月01日 星期五 16:23

jessinio smith jessinio在gmail.com
星期五 六月 1 16:23:19 HKT 2007

ËٶȻ¹ÐÐÂð£¿ÖÕÓÚÓÐÖÐÎķִʵĶ«Î÷ÓÃÁË¡£¹þ¹þ¹þ
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070601/fb4a1f02/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年06月02日 星期六 11:20

junyi sun ccnusjy在gmail.com
星期六 六月 2 11:20:48 HKT 2007

ËٶȻ¹ÐУ¬1W×ÖÒÔÄÚµÄÎÄÕ»ù±¾ÔÚºÁÃ뼶¸ã¶¨

On 6/1/07, jessinio smith <jessinio在gmail.com> wrote:
>
> ËٶȻ¹ÐÐÂð£¿ÖÕÓÚÓÐÖÐÎķִʵĶ«Î÷ÓÃÁË¡£¹þ¹þ¹þ
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070602/59bfc3c0/attachment.html 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2007年06月02日 星期六 23:37

alex yu yu.alex.z.y在gmail.com
星期六 六月 2 23:37:24 HKT 2007

ÔõôʹÓð¡
 import re
ÊÇϵͳµÄreô£¿


On 6/1/07, jessinio smith <jessinio在gmail.com> wrote:
>
> ËٶȻ¹ÐÐÂð£¿ÖÕÓÚÓÐÖÐÎķִʵĶ«Î÷ÓÃÁË¡£¹þ¹þ¹þ
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to   python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070602/4b1aa4fb/attachment.htm 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号