Python论坛的帖子：

Fri May 27 11:15:44 HKT 2005

没看清楚，不好意思！


Carambo ， qutr at tjub.com.cn 
2005-5-27 
----- 收到以下来信内容 ----- 
发件人: limodou 
收件人: qutr,python-chinese 
时  间: 2005-05-27, 11:13:12
主  题: Re: [python-chinese] 关于python正则表达式的一个问题


他说的情况是两个汉字，前面的汉字后半个的字节，加上后面汉字的前半个字节组合在一起成了另一个汉字了。使用unicode就不会有这样的情况。

在 05-5-27，Carambo<qutr at tjub.com.cn> 写道：
> dimension，你好
> 你的"！"应该是中文标点吧，我觉得一般中文字符是用两个字节来显示的
> a = "】"
> a
> '\xa1\xbf'
> b = "！"
> b
> '\xa3\xa1'
> 两个是不一样的，应该不会发生你说的那种情况吧！
> 
> Carambo， qutr at tjub.com.cn
> 2005-5-27 
> ----- 收到以下来信内容 ----- 
> 发件人: dimension 
> 收件人: python-chinese 
> 时 间: 2005-05-27, 09:54:37
> 主 题: [python-chinese] 关于python正则表达式的一个问题
> 
> 
> 
> python-chinese，您好！
> 
> 比如要实现一个句子分割器，
> 只有是"。！？"以及 
> 。"
> ！"
> ？"
> 等几种情况结尾的句子都分割称单独的句子。
> 
> 我利用正则表达式
> expression = r"。|！|？|。"|！"|？""
> 
> listSentence = re.split(expression, sentence)
> 
> 但是这样情况下，就会把汉字中某些字分开称乱码，比如：
> 
> 假如
> 
> str1 = "【幸福】的人是很少的。"
> 
> 这样一个字符串就会被分开，因为"福"的后一半"】"的前一半正好是a3a1是一个"！"。当然，类似的情况肯定还有。
> 
> 不知道怎么解决。
> 
> 　　　　　　　　致
> 礼！
> 
> 
> 　　　　　　　　dimension
> 　　　　　　　　dimension at hit.edu.cn
> 　　　　　　　　　　2005-05-27
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
> 
> 
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
> 
> 
> 


-- 
I like python! 
My Donews Blog: http://www.donews.net/limodou
New Google Maillist: http://groups-beta.google.com/group/python-cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20050527/cf574c1e/attachment.htm

标题：Re: Re: [python-chinese] 关于python正则表达式的一个问题