Python论坛  - 讨论区

标题:[python-chinese] 我想做一个脚本,将任意编码类型的文件读入并转化为unicode,然后根据其中标点,每次取一句话出来。但我有几个问题:

2007年11月08日 星期四 11:17

clfff.peter clfff.peter在gmail.com
星期四 十一月 8 11:17:26 HKT 2007

×öÁ˸ö°æ±¾£¬»¹ÐУ¬²»¹ýûÓжà²â£¬ÓÐÐËȤµÄ¿ÉÒÔ°ïæÓÃÓ㬸ĸġ£
ûÓÐÔÚ³ÌÐòÖÐдseps£¬ÒòΪÊÔÁËÏ£¬²»ºÃ¸ã£¬×îºóÓÃÁËÒ»¸ösep_fileÀ´±£´æÎÒµÄËùÓпÉÄÜsep¡£
×¢ÒâsepfileÊÇunicodeµÄ£¬¶øÇÒµÚÒ»¸ösepÊÇ»»ÐС£
лл¡£
~¡«     ~¡«
   \ -------*



ÔÚ07-10-29£¬??? ?? <clfff.peter在gmail.com> дµÀ£º
>
> ÏÈÊÔÊÔ°É£¬Ð»Ð»¡£
>
> 2007/10/28, Jiahua Huang <jhuangjiahua在gmail.com>:
> >
> > ¸øÄãÌáʾÏÂ
> >
> >
> > #!/usr/bin/python
> > # -*- coding: UTF-8 -*-
> >
> > def zh2unicode(stri):
> >        """Auto converter encodings to unicode
> >
> >        It will test utf8,gbk,big5,jp,kr to converter"""
> >        global encc
> >        for c in ('utf-8', 'gbk', 'big5', 'jp',
> > 'euc_kr','utf16','utf32'):
> >                encc = c
> >                try:
> >                        return stri.decode(c)
> >                except:
> >                        pass
> >        encc = 'unk'
> >        return stri
> >
> > seps=[" ","\t","\n","\r",",","<",">","?","!",
> > ";","\#",":",".","'",'"',"(",")","{","}","[","]","|","_","=",
> > " ","£¬","£¿","¡£","¡¢",""",""","¡¶","¡·","£Û","£Ý","£¡","£¨","£©"]
> >
> > seps=map(lambda i:unicode(i,'utf8'), seps)
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to  python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20071108/72ddb672/attachment.html 
-------------- 下一部分 --------------
ǶÈëµÄ¡¢ÎÞ·¨È·¶¨×Ö·û¼¯µÄÎı¾±»³ýÈ¥ÁË...
Ãû³Æ£ºreader_assi.py
Url£º http://python.cn/pipermail/python-chinese/attachments/20071108/72ddb672/attachment.pot 
-------------- 下一部分 --------------
ǶÈëµÄ¡¢ÎÞ·¨È·¶¨×Ö·û¼¯µÄÎı¾±»³ýÈ¥ÁË...
Ãû³Æ£ºtxt.txt
Url£º http://python.cn/pipermail/python-chinese/attachments/20071108/72ddb672/attachment.txt 
-------------- 下一部分 --------------
ǶÈëµÄ¡¢ÎÞ·¨È·¶¨×Ö·û¼¯µÄÎı¾±»³ýÈ¥ÁË...
Ãû³Æ£ºused_seps.txt
Url£º http://python.cn/pipermail/python-chinese/attachments/20071108/72ddb672/attachment-0001.txt 

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2025

    京ICP备05028076号