计算单词出现次数,允许特殊字符和换行符

我正在尝试构build一个函数来计算短语中词的出现次数。

该function应该包括这样一些情况:短语中的单词具有附加的非字母字符和/或行尾字符。

function countWordInText(word,phrase){ var c=0; phrase = phrase.concat(" "); regex = (word,/\W/g); var fChar = phrase.indexOf(word); var subPhrase = phrase.slice(fChar); while (regex.test(subPhrase)){ c += 1; subPhrase = subPhrase.slice((fChar+word.length)); fChar = subPhrase.indexOf(word); } return c; } 

问题是,对于一个简单的值,如

 phrase = "hi hi hi all hi. hi"; word = "hi" // OR word = "hi all"; 

它返回错误的值。

你写的algorithm显示你花了一些时间试图让这个工作。 不过,还有不less地方不行。 例如, (word,/W/g)实际上并不是创build你可能认为的正则expression式。

还有一个更简单的方法:

 function countWordInText (word, phrase) { // Escape any characters in `word` that may have a special meaning // in regular expressions. // Taken from https://stackoverflow.com/a/6969486/4220785 word = word.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, '\\$&') // Replace any whitespace in `word` with `\s`, which matches any // whitespace character, including line breaks. word = word.replace(/\s+/g, '\\s') // Create a regex with our `word` that will match it as long as it // is surrounded by a word boundary (`\b`). A word boundary is any // character that isn't part of a word, like whitespace or // punctuation. var regex = new RegExp('\\b' + word + '\\b', 'g') // Get all of the matches for `phrase` using our new regex. var matches = phrase.match(regex) // If some matches were found, return how many. Otherwise, return 0. return matches ? matches.length : 0 } countWordInText('hi', 'hi hi hi all hi. hi') // 5 countWordInText('hi all', 'hi hi hi all hi. hi') // 1 countWordInText('hi all', 'hi hi hi\nall hi. hi') // 1 countWordInText('hi all', 'hi hi hi\nalligator hi. hi') // 0 countWordInText('hi', 'hi himalayas') // 1 

我在整个例子中发表评论。 希望这可以帮助你开始!

以下是在Javascript中学习正则expression式的几个好地方:

  • 正则expression式 – MDN Web文档
  • 正则expression式在Javascript – egghead.io

你也可以使用Regexr来testing你的正则expression式。