Simple Acronyms Finder Using Regular Expressions

Simple Acronyms Finder Using Regular Expressions

ECX 30 Days of Code and Design

Day 13

What are the Acronyms?

Task

Write a program that does the following:

  • Ask the user to enter (input) a sentence containing an acronym or more.
  • Print out a list containing all acronyms in the sentence. For example:
  • Input: "I need to get this done ASAP."; Output–> ["ASAP"]
  • Input: "SMH. The NPF is really a joke!"; Output–> ["SMH", "NPF"]
  • Input: "LOOOL. I thought you were at KFC"; Output–> ["LOOOL", "KFC"] (Note: An "acronym", for our purposes, is defined as any continuous sequence of UPPERCASE LETTERS, not separated by a white space or a symbol.)

My Approach

First, we import the re module which we would use to search out the pattern (capital letters in this case). Next, we ask the user to input a text containing acronyms. Next we specify what the match pattern, (?:[A-Z](?:\.|\s)?){2,15}. Here, we would look for capital letters that are not separated, separated by dots, or separated by a single space. We use {2,15} to prevent the capital first letter of title cases from being matched (i.e., we would only consider a case where there are two or more (up to 15 in this code) uppercase letters close to each other or separated by either dot or space). The ?: is used to make the groups (patterns in parentheses) non capturing because findall() method returns a tuple of all groups (instead of the entire search pattern) when the groups can be captured. (?:\.|\s)? means an optional matching of dot or space with the capital letters.

import re

user_input = input('Please, input a statement with acronyms (UPPERCASE): ')

acronyms = re.compile(r'(?:[A-Z](?:\.|\s)?){2,15}')
acronyms_match = acronyms.findall(user_input)

print(acronyms_match)

Next, we make the user’s input the argument of the findall() method, and finally, we print the list of patterns found.

Output

If we input “The acronyms are GMAT, T O E F L, S.A.T,” we get;

Please, input a statement with acronyms (UPPERCASE): The acronyms are GMAT, T O E F L, S.A.T
['GMAT', 'T O E F L', 'S.A.T']

Run code on Replit