Skip to content Skip to sidebar Skip to footer

Counting Specific Punctuation Symbols In A Given Text, Without Using Regex Or Other Modules

I have a text file with a huge text written in paragraphs. I need to count certain punctuation symbols: without using any module, not even regex count , and ; also needs to count

Solution 1:

You will need to create a dictionary where each entry stores the count of each of those punctuation characters. For commas and semicolons, we can simply do a string search to count the number of occurences in a word. But we'll need to handle ' and - slightly differently.

This should take care of all the cases:

withopen("/Users/abhishekabhishek/downloads/l.txt") as f:
    text_words = f.read().split()
punctuation_count = {}
punctuation_count[','] = 0
punctuation_count[';'] = 0
punctuation_count["'"] = 0
punctuation_count['-'] = 0defsearch_for_single_quotes(word):
    single_quote = "'"
    search_char_index = word.find(single_quote)
    search_char_count = word.count(single_quote)
    if search_char_index == -1and search_char_count != 1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1# Check if the characters before and after the quote are alphabets,# and the alphabet after the quote is the last character of the word.# Will detect `won't`, `shouldn't`, but not `ab'cd`, `y'ess`if index_before >= 0and word[index_before].isalpha() and \
            index_after == len(word) - 1and word[index_after].isalpha():
        punctuation_count[single_quote] += 1defsearch_for_hyphens(word):
    hyphen = "-"
    search_char_index = word.find(hyphen)
    if search_char_index == -1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1# Check if the character before and after hyphen is an alphabet.# You can also change it check for characters as well as numbers# depending on your use case.if index_before >= 0and word[index_before].isalpha() and \
            index_after < len(word) and word[index_after].isalpha():
        punctuation_count[hyphen] += 1for word in text_words:
    for search_char in [',', ';']:
        search_char_count = word.count(search_char)
        punctuation_count[search_char] += search_char_count
    search_for_single_quotes(word)
    search_for_hyphens(word)


print(punctuation_count)

Solution 2:

following should work:

text = open("/Users/abhishekabhishek/downloads/l.txt").read()

text = text.replace("--", " ")

for symbol in"-'":
    text = text.replace(symbol + " ", "")
    text = text.replace(" " + symbol, "")

for symbol in".,/'-":
    print (symbol, text.count(symbol))

Solution 3:

Because you don't want to import anything this will be slow and will take some time, but it should work:

file = open() # enter your file path as parameter
lines = file.readline() # enter the number of lines in your document as parameter
search_chars = [',', ';', "'", '-'] # store the values to be searched
search_values = {',':0, ';':0, "'":0, '-':0} # a dictionary saves the number of occurences
whitespaces = [' ', '--', '1', '2', ...] # you can add to this list whatever you needfor line in lines:
    for search in search_chars:
        if search in line and (search in search_chars):
            chars = line.split()
            for ch_index in chars:
                if chars [ch_index] == ',':
                    search_values [','] += 1elif chars [ch_index] == ';':
                    search_values [';'] += 1elif chars[ch_index] == "'"andnot(chars[ch_index-1] in whitespaces) andnot(chars[ch_index+1] in whitespaces):
                    search_values ["'"] += 1elif chars[ch_index] == "-"andnot(chars[ch_index-1] in whitespaces) andnot(chars[ch_index+1] in whitespaces):
                    search_values ["-"] += 1for key inrange(search_values.keys()):
    print(str(key) + ': ' + search_values[key])

This is obviously not optimal and it is better to use regex here, but it should work.

Feel free to ask if any questions should arise.

Post a Comment for "Counting Specific Punctuation Symbols In A Given Text, Without Using Regex Or Other Modules"