你将如何build立一个正则expression式来将一个段落分成几个句子,但是不会在内部的任何标点符号上进行分割。
目前,我有这样的正则expression式将一个段落分成几个句子:/[ /[^\.!\?]+[\.!\?]+/g
.! /[^\.!\?]+[\.!\?]+/g
.! /[^\.!\?]+[\.!\?]+/g
?]+/ /[^\.!\?]+[\.!\?]+/g
。 问题是,我的段落不仅仅是文本的段落。 我有像这样的链接:
This is text and here is a <value="link" href="http://link.com?param=test"> which directs to another page. So I don't want to split at the anything inside the link above.
我想分裂成一个数组,如:
['This is text and here is a <value="link" href="http://link.com?param=test"> which directs to another page.', 'So I don't want to split at the anything inside the link above.']
什么正则expression式会做到这一点?
尝试这个:
(.+?[\.!\?](?!.+?>)\s*)