Python Word Count (Filter out Punctuation, Dictionary Manipulation, and Sorting Lists)

In this tutorial, we will discuss python word count (Filter out Punctuation, Dictionary Manipulation, and Sorting Lists). Also, you guys can see some of the approaches on Output a List of Word Count Pairs. Let’s use the below links and have a quick reference on this python concept.

How to count the number of words in a sentence, ignoring numbers, punctuation, and whitespace?

First, we will take a paragraph after that we will clean punctuation and transform all words to lowercase. Then we will count how many times each word occurs in that paragraph.

Text="Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!The community hosts conferences and meetups, collaborates on code, and much more. Python's documentation will help you along the way, and the mailing lists will keep you in touch.Python is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered.Python is a general-purpose coding language—which means that, unlike HTML, CSS, and JavaScript, it can be used for other types of programming and software development besides web development. That includes back end development, software development, data science and writing system scripts among other things."
for char in '-.,\n':
Text=Text.replace(char,' ')
Text = Text.lower()
# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s) 
word_list = Text.split()
print(word_list)

Output:

['python', 'can', 'be', 'easy', 'to', 'pick', 'up', 'whether', 
"you're", 'a', 'first', 'time', 'programmer', 'or', "you're",
 'experienced', 'with', 'other', 'languages', 'the', 'following', 
'pages', 'are', 'a', 'useful', 'first', 'step', 'to', 'get', 'on', 'your', 
'way', 'writing', 'programs', 'with', 'python!the', 'community',
 'hosts', 'conferences', 'and', 'meetups', 'collaborates', 'on', 'code', 
'and', 'much', 'more', "python's", 'documentation', 'will', 'help', 'you',
 'along', 'the', 'way', 'and', 'the', 'mailing', 'lists', 'will', 'keep', 'you', 'in',
 'touch', 'python', 'is', 'developed', 'under', 'an', 'osi', 'approved', 'open',
 'source', 'license', 'making', 'it', 'freely', 'usable', 'and', 'distributable', 
'even', 'for', 'commercial', 'use', "python's", 'license', 'is', 'administered', 
'python', 'is', 'a', 'general', 'purpose', 'coding', 'language—which', 'means', 
'that', 'unlike', 'html', 'css', 'and', 'javascript', 'it', 'can', 'be', 'used', 'for', 'other', 
'types', 'of', 'programming', 'and', 'software', 'development', 'besides', 'web', 
'development', 'that', 'includes', 'back', 'end', 'development', 'software', 
'development', 'data', 'science', 'and', 'writing', 'system', 'scripts', 'among', 'other', 'things']

So in the above output, you can see a list of word count pairs which is sorted from highest to lowest.

Thus, now we are going to discuss some approaches.

Also Check:

Output a List of Word Count Pairs (Sorted from Highest to Lowest)

1. Collections Module:

The collections module approach is the easiest one but for using this we have to know which library we are going to use.

from collections import Counter

Counter(word_list).most_common()

In this, collections module, we will import the counter then implement this in our programme.

from collections import Counter
Text="Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!The community hosts conferences and meetups, collaborates on code, and much more. Python's documentation will help you along the way, and the mailing lists will keep you in touch.Python is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered.Python is a general-purpose coding language—which means that, unlike HTML, CSS, and JavaScript, it can be used for other types of programming and software development besides web development. That includes back end development, software development, data science and writing system scripts among other things."
word_list = Text.split()
count=Counter(word_list).most_common()
print(count)

Output:

[('and', 7), ('a', 3), ('other', 3), ('is', 3), ('can', 2), ('be', 2), ('to', 2), 
("you're", 2), ('first', 2), ('with', 2), ('on', 2), ('writing', 2), ("Python's", 2),
 ('will', 2), ('you', 2), ('the', 2), ('it', 2), ('for', 2), ('software', 2), ('development,', 2), 
('Python', 1), ('easy', 1), ('pick', 1), ('up', 1), ('whether', 1), ('time', 1), ('programmer', 1),
 ('or', 1), ('experienced', 1), ('languages.', 1), ('The', 1), ('following', 1), ('pages', 1), ('are', 1), 
('useful', 1), ('step', 1), ('get', 1), ('your', 1), ('way', 1), ('programs', 1), ('Python!The', 1), 
('community', 1), ('hosts', 1), ('conferences', 1), ('meetups,', 1), ('collaborates', 1), ('code,', 1), 
('much', 1), ('more.', 1), ('documentation', 1), ('help', 1), ('along', 1), ('way,', 1), ('mailing', 1),
 ('lists', 1), ('keep', 1), ('in', 1), ('touch.Python', 1), ('developed', 1), ('under', 1), ('an', 1),
 ('OSI-approved', 1), ('open', 1), ('source', 1), ('license,', 1), ('making', 1), ('freely', 1),
 ('usable', 1), ('distributable,', 1), ('even', 1), ('commercial', 1), ('use.', 1), ('license', 1), 
('administered.Python', 1), ('general-purpose', 1), ('coding', 1), ('language—which', 1), ('means', 1),
 ('that,', 1), ('unlike', 1), ('HTML,', 1), ('CSS,', 1), ('JavaScript,', 1), ('used', 1), ('types', 1), ('of', 1), 
('programming', 1), ('development', 1), ('besides', 1), ('web', 1), ('development.', 1), ('That', 1), 
('includes', 1), ('back', 1), ('end', 1), ('data', 1), ('science', 1), ('system', 1), ('scripts', 1), ('among', 1), ('things.', 1)]

2. Using For Loops:

This is the second approach and in this, we will use for loop and dictionary get method.

from collections import Counter
Text="Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!The community hosts conferences and meetups, collaborates on code, and much more. Python's documentation will help you along the way, and the mailing lists will keep you in touch.Python is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered.Python is a general-purpose coding language—which means that, unlike HTML, CSS, and JavaScript, it can be used for other types of programming and software development besides web development. That includes back end development, software development, data science and writing system scripts among other things."
word_list = Text.split()
# Initializing Dictionary
d = {}
# counting number of times each word comes up in list of words (in dictionary)
for word in word_list: 
    d[word] = d.get(word, 0) + 1
word_freq = []
for key, value in d.items():
    word_freq.append((value, key))
word_freq.sort(reverse=True) 
print(word_freq)

Output:

[(7, 'and'), (3, 'other'), (3, 'is'), (3, 'a'), (2, "you're"), (2, 'you'), (2, 'writing'),
 (2, 'with'), (2, 'will'), (2, 'to'), (2, 'the'), (2, 'software'), (2, 'on'), (2, 'it'), (2, 'for'), (
2, 'first'), (2, 'development,'), (2, 'can'), (2, 'be'), (2, "Python's"), (1, 'your'), (1, 'whether'),
 (1, 'web'), (1, 'way,'), (1, 'way'), (1, 'useful'), (1, 'used'), (1, 'use.'), (1, 'usable'), (1, 'up'), 
(1, 'unlike'), (1, 'under'), (1, 'types'), (1, 'touch.Python'), (1, 'time'), (1, 'things.'), (1, 'that,'), 
(1, 'system'), (1, 'step'), (1, 'source'), (1, 'scripts'), (1, 'science'), (1, 'programs'),
 (1, 'programming'), (1, 'programmer'), (1, 'pick'), (1, 'pages'), (1, 'or'), (1, 'open'), 
(1, 'of'), (1, 'much'), (1, 'more.'), (1, 'meetups,'), (1, 'means'), (1, 'making'), (1, 'mailing'),
 (1, 'lists'), (1, 'license,'), (1, 'license'), (1, 'language—which'), (1, 'languages.'), (1, 'keep'),
 (1, 'includes'), (1, 'in'), (1, 'hosts'), (1, 'help'), (1, 'get'), (1, 'general-purpose'), (1, 'freely'), 
(1, 'following'), (1, 'experienced'), (1, 'even'), (1, 'end'), (1, 'easy'), (1, 'documentation'),
 (1, 'distributable,'), (1, 'development.'), (1, 'development'), (1, 'developed'), (1, 'data'), 
(1, 'conferences'), (1, 'community'), (1, 'commercial'), (1, 'collaborates'), (1, 'coding'), 
(1, 'code,'), (1, 'besides'), (1, 'back'), (1, 'are'), (1, 'an'), (1, 'among'), (1, 'along'), (1, 'administered.Python'),
 (1, 'The'), (1, 'That'), (1, 'Python!The'), (1, 'Python'), (1, 'OSI-approved'), (1, 'JavaScript,'), (1, 'HTML,'), (1, 'CSS,')]

So in the above approach, we have used for loop after that we reverse the key and values so they can be sorted using tuples. Now we sorted from lowest to highest.

3. Not using Dictionary Get Method:

So in this approach, we will not use the get method dictionary.

from collections import Counter
Text="Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!The community hosts conferences and meetups, collaborates on code, and much more. Python's documentation will help you along the way, and the mailing lists will keep you in touch.Python is developed under an OSI-approved open source license, making it freely usable and distributable, even for commercial use. Python's license is administered.Python is a general-purpose coding language—which means that, unlike HTML, CSS, and JavaScript, it can be used for other types of programming and software development besides web development. That includes back end development, software development, data science and writing system scripts among other things."
word_list = Text.split()
# Initializing Dictionary
d = {}

# Count number of times each word comes up in list of words (in dictionary)
for word in word_list:
    if word not in d:
        d[word] = 0
    d[word] += 1
word_freq = []
for key, value in d.items():
    word_freq.append((value, key))
word_freq.sort(reverse=True)
print(word_freq)

Output:

[(7, 'and'), (3, 'other'), (3, 'is'), (3, 'a'), (2, "you're"), (2, 'you'), (2, 'writing'),
 (2, 'with'), (2, 'will'), (2, 'to'), (2, 'the'), (2, 'software'), (2, 'on'), (2, 'it'), (2, 'for'), (2, 'first'), 
(2, 'development,'), (2, 'can'), (2, 'be'), (2, "Python's"), (1, 'your'), (1, 'whether'), (1, 'web'),
 (1, 'way,'), (1, 'way'), (1, 'useful'), (1, 'used'), (1, 'use.'), (1, 'usable'), (1, 'up'), (1, 'unlike'),
 (1, 'under'), (1, 'types'), (1, 'touch.Python'), (1, 'time'), (1, 'things.'), (1, 'that,'), (1, 'system'), 
(1, 'step'), (1, 'source'), (1, 'scripts'), (1, 'science'), (1, 'programs'), (1, 'programming'), 
(1, 'programmer'), (1, 'pick'), (1, 'pages'), (1, 'or'), (1, 'open'), (1, 'of'), (1, 'much'),
 (1, 'more.'), (1, 'meetups,'), (1, 'means'), (1, 'making'), (1, 'mailing'), (1, 'lists'), (1, 'license,'), 
(1, 'license'), (1, 'language—which'), (1, 'languages.'), (1, 'keep'), (1, 'includes'), (1, 'in'), (1, 'hosts'),
 (1, 'help'), (1, 'get'), (1, 'general-purpose'), (1, 'freely'), (1, 'following'), (1, 'experienced'), 
(1, 'even'), (1, 'end'), (1, 'easy'), (1, 'documentation'), (1, 'distributable,'), (1, 'development.'),
 (1, 'development'), (1, 'developed'), (1, 'data'), (1, 'conferences'), (1, 'community'), (1, 'commercial'),
 (1, 'collaborates'), (1, 'coding'), (1, 'code,'), (1, 'besides'), (1, 'back'), (1, 'are'), (1, 'an'), (1, 'among'),
 (1, 'along'), (1, 'administered.Python'), (1, 'The'), (1, 'That'), (1, 'Python!The'), (1, 'Python'), 
(1, 'OSI-approved'), (1, 'JavaScript,'), (1, 'HTML,'), (1, 'CSS,')]

4. Using Sorted:

# initializing a dictionary
d = {};

# counting number of times each word comes up in list of words
for key in word_list: 
    d[key] = d.get(key, 0) + 1

sorted(d.items(), key = lambda x: x[1], reverse = True)

Conclusion:

In this article, you have seen different approaches on how to count the number of words in a sentence, ignoring numbers, punctuation, and whitespace. Thank you!

Leave a Comment