8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order. You can download the sample data at http://www.py4e.com/code3/romeo.txt

Python File Handling Example: Extract Unique Words from a Text File

Working with files is one of the most important skills when learning Python. In many real-world applications, data is stored in files such as logs, documents, or datasets. A programmer often needs to read the file, process the text, and extract useful information.

In this tutorial, we will learn how to read a file line by line in Python and extract unique words from it. The goal of this program is to open a file called romeo.txt, read its contents, and build a list containing only unique words. After collecting the words, the program sorts them alphabetically and prints the result.

This is a classic beginner exercise commonly used in Python for Everybody, which helps students understand Python lists, loops, and file handling.


Problem Statement

The program should perform the following steps:

  1. Open the file romeo.txt.

  2. Read the file line by line.

  3. Split each line into individual words.

  4. Check if each word is already in the list.

  5. If the word is not present, add it to the list.

  6. After reading the entire file, sort the words alphabetically.

  7. Print the final list.

The sample file romeo.txt contains text similar to:

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon

The goal is to extract all unique words from the file.


Python Program

Below is the Python program that performs the required task.

fname = input("Enter file name: ")
fh = open(fname)

lst = list()

for line in fh:
word = line.rstrip().split()

for element in word:
if element in lst:
continue
else:
lst.append(element)

lst.sort()
print(lst)

Step-by-Step Explanation

Let us understand how this program works step by step.


1. Asking the User for File Name

fname = input("Enter file name: ")

This line asks the user to enter the name of the file. For example:

Enter file name: romeo.txt

The file name is stored in the variable fname.


2. Opening the File

fh = open(fname)

The open() function opens the file so Python can read its contents.

The file object fh now represents the file and allows us to read it line by line.


3. Creating an Empty List

lst = list()

Here we create an empty list called lst. This list will store all the unique words found in the file.

Lists in Python are useful because they allow us to store multiple values and manipulate them easily.


4. Reading the File Line by Line

for line in fh:

This loop reads the file one line at a time. Each iteration processes a single line from the file.

For example, a line might be:

But soft what light through yonder window breaks

5. Removing Extra Spaces and Splitting Words

word = line.rstrip().split()

Two important operations happen here.

rstrip()

This removes the newline character at the end of the line.

split()

The split() function divides the line into individual words.

Example:

"But soft what light".split()

Result:

['But', 'soft', 'what', 'light']

Now word becomes a list containing the words from that line.


6. Checking Each Word

for element in word:

This loop goes through each word in the line.

For example:

['But', 'soft', 'what', 'light']

Each word will be checked individually.


7. Checking if Word Already Exists

if element in lst:
continue

This condition checks whether the word is already present in the list.

If the word already exists, the program skips it using continue.

This prevents duplicates.


8. Adding New Words

else:
lst.append(element)

If the word is not already in the list, it is added using the append() function.

Example:

lst = ['But', 'soft']

After appending a new word:

lst = ['But', 'soft', 'what']

9. Sorting the List

lst.sort()

After reading the entire file, the program sorts the words alphabetically.

Sorting helps organize the words in a readable format.

Example:

Before sorting:

['But', 'soft', 'what', 'light']

After sorting:

['But', 'light', 'soft', 'what']

10. Printing the Final Result

print(lst)

Finally, the program prints the sorted list of unique words.

Example output:

['Arise', 'But', 'It', 'Juliet', 'and', 'breaks', 'east', 'envious', 'fair', 'is', 'kill', 'light', 'moon', 'soft', 'sun', 'the', 'through', 'what', 'window', 'yonder']

Why This Program is Useful

This program teaches several fundamental Python concepts:

File Handling

Reading data from files is essential in many applications such as data analysis and automation.

Lists

Lists allow you to store multiple values and perform operations like sorting and searching.

Loops

Loops help automate repetitive tasks like reading lines or checking words.

Condition Checking

Using if statements allows programs to make decisions.


Possible Improvements

This program works well, but it can be improved.

For example:

Using Sets

Python sets automatically remove duplicate values.

Example:

words = set()

for line in fh:
for word in line.split():
words.add(word)

print(sorted(words))

This version is shorter and faster.


Conclusion

In this tutorial, we learned how to read a text file in Python and extract unique words from it. The program opens a file, processes each line, splits it into words, and stores only unique words in a list. After processing the entire file, the list is sorted alphabetically and printed.

This simple exercise demonstrates important programming concepts such as file handling, lists, loops, and conditional statements. Mastering these basics is essential for writing more complex Python programs in the future.

If you are learning Python, practicing such exercises regularly will improve your understanding of programming logic and problem solving.

Post a Comment

0 Comments