Python File Handling Example: Extract Unique Words from a Text File
Working with files is one of the most important skills when learning Python. In many real-world applications, data is stored in files such as logs, documents, or datasets. A programmer often needs to read the file, process the text, and extract useful information.
In this tutorial, we will learn how to read a file line by line in Python and extract unique words from it. The goal of this program is to open a file called romeo.txt, read its contents, and build a list containing only unique words. After collecting the words, the program sorts them alphabetically and prints the result.
This is a classic beginner exercise commonly used in Python for Everybody, which helps students understand Python lists, loops, and file handling.
Problem Statement
The program should perform the following steps:
-
Open the file romeo.txt.
-
Read the file line by line.
-
Split each line into individual words.
-
Check if each word is already in the list.
-
If the word is not present, add it to the list.
-
After reading the entire file, sort the words alphabetically.
-
Print the final list.
The sample file romeo.txt contains text similar to:
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
The goal is to extract all unique words from the file.
Python Program
Below is the Python program that performs the required task.
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
word = line.rstrip().split()
for element in word:
if element in lst:
continue
else:
lst.append(element)
lst.sort()
print(lst)
Step-by-Step Explanation
Let us understand how this program works step by step.
1. Asking the User for File Name
fname = input("Enter file name: ")
This line asks the user to enter the name of the file. For example:
Enter file name: romeo.txt
The file name is stored in the variable fname.
2. Opening the File
fh = open(fname)
The open() function opens the file so Python can read its contents.
The file object fh now represents the file and allows us to read it line by line.
3. Creating an Empty List
lst = list()
Here we create an empty list called lst. This list will store all the unique words found in the file.
Lists in Python are useful because they allow us to store multiple values and manipulate them easily.
4. Reading the File Line by Line
for line in fh:
This loop reads the file one line at a time. Each iteration processes a single line from the file.
For example, a line might be:
But soft what light through yonder window breaks
5. Removing Extra Spaces and Splitting Words
word = line.rstrip().split()
Two important operations happen here.
rstrip()
This removes the newline character at the end of the line.
split()
The split() function divides the line into individual words.
Example:
"But soft what light".split()
Result:
['But', 'soft', 'what', 'light']
Now word becomes a list containing the words from that line.
6. Checking Each Word
for element in word:
This loop goes through each word in the line.
For example:
['But', 'soft', 'what', 'light']
Each word will be checked individually.
7. Checking if Word Already Exists
if element in lst:
continue
This condition checks whether the word is already present in the list.
If the word already exists, the program skips it using continue.
This prevents duplicates.
8. Adding New Words
else:
lst.append(element)
If the word is not already in the list, it is added using the append() function.
Example:
lst = ['But', 'soft']
After appending a new word:
lst = ['But', 'soft', 'what']
9. Sorting the List
lst.sort()
After reading the entire file, the program sorts the words alphabetically.
Sorting helps organize the words in a readable format.
Example:
Before sorting:
['But', 'soft', 'what', 'light']
After sorting:
['But', 'light', 'soft', 'what']
10. Printing the Final Result
print(lst)
Finally, the program prints the sorted list of unique words.
Example output:
['Arise', 'But', 'It', 'Juliet', 'and', 'breaks', 'east', 'envious', 'fair', 'is', 'kill', 'light', 'moon', 'soft', 'sun', 'the', 'through', 'what', 'window', 'yonder']
Why This Program is Useful
This program teaches several fundamental Python concepts:
File Handling
Reading data from files is essential in many applications such as data analysis and automation.
Lists
Lists allow you to store multiple values and perform operations like sorting and searching.
Loops
Loops help automate repetitive tasks like reading lines or checking words.
Condition Checking
Using if statements allows programs to make decisions.
Possible Improvements
This program works well, but it can be improved.
For example:
Using Sets
Python sets automatically remove duplicate values.
Example:
words = set()
for line in fh:
for word in line.split():
words.add(word)
print(sorted(words))
This version is shorter and faster.
Conclusion
In this tutorial, we learned how to read a text file in Python and extract unique words from it. The program opens a file, processes each line, splits it into words, and stores only unique words in a list. After processing the entire file, the list is sorted alphabetically and printed.
This simple exercise demonstrates important programming concepts such as file handling, lists, loops, and conditional statements. Mastering these basics is essential for writing more complex Python programs in the future.
If you are learning Python, practicing such exercises regularly will improve your understanding of programming logic and problem solving.
0 Comments