python统计文本字符串里单词出现频率的方法

本文实例讲述了python统计文本字符串里单词出现频率的方法。分享给大家供大家参考。具体实现方法如下:

# word frequency in a text
# tested with python24 vegaseat 25aug2005
# chinese wisdom …
str1 = “””man who run in front of car, get tired.
man who run behind car, get exhausted.”””
print “original string:”
print str1
print
# create a list of words separated at whitespaces
wordlist1 = str1.split(none)
# strip any punctuation marks and build modified word list
# start with an empty list
wordlist2 = []
for word1 in wordlist1:
# last character of each word
lastchar = word1[-1:]
# use a list of punctuation marks
if lastchar in [“,”, “.”, “!”, “?”, “;”]:
word2 = word1.rstrip(lastchar)
else:
word2 = word1
# build a wordlist of lower case modified words
wordlist2.append(word2.lower())
print “word list created from modified string:”
print wordlist2
print
# create a wordfrequency dictionary
# start with an empty dictionary
freqd2 = {}
for word2 in wordlist2:
freqd2[word2] = freqd2.get(word2, 0) + 1
# create a list of keys and sort the list
# all words are lower case already
keylist = freqd2.keys()
keylist.sort()
print “frequency of each word in the word list (sorted):”
for key2 in keylist:
print “%-10s %d” % (key2, freqd2[key2])

希望本文所述对大家的python程序设计有所帮助。

Posted in 未分类

发表评论