Python论坛的帖子：

Thu May 26 21:34:19 HKT 2005

这是因为你使用字节码，如果使用unicode应该不存在这个问题了。

在 Thu May 12 22:35:26 2005 +0800，李维刚<dimension at hit.edu.cn> 写道：
> 各位好：
> 
> 比如要实现一个句子分割器，
> 只有是"。！？"以及
> 。"
> ！"
> ？"
> 等几种情况结尾的句子都分割称单独的句子。
> 
> 我利用正则表达式
> expression = r"。|！|？|。"|！"|？""
> 
> listSentence = re.split(expression, sentence)
> 
> 但是这样情况下，就会把汉字中某些字分开称乱码，比如：
> 
> 假如
> 
> str1 = "【幸福】的人是很少的。"
> 
> 这样一个字符串就会被分开，因为"福"的后一半"】"的前一半正好是a3a1是一个"！"。当然，类似的情况肯定还有。
> 
> 不知道怎么解决。
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese
> 
> 
> 


-- 
I like python! 
My Donews Blog: http://www.donews.net/limodou
New Google Maillist: http://groups-beta.google.com/group/python-cn

标题：Re: [python-chinese] 关于一个正则表达式的问题