Python Program to Extract Email Addresses from a File
This Python program reads a file line by line and extracts email addresses from lines that start with the word “From ”. It also counts how many such lines exist in the file. This type of program is very useful in data processing, email log analysis, and text mining, where we need to extract specific information from large text files.
Let us understand the program step by step.
1. Taking File Name as Input
The program begins by asking the user to enter the name of the file.
fname = input("Enter file name: ")
The input() function is used to read input from the user. The user types the file name (for example, mbox-short.txt), and it is stored in the variable fname.
This makes the program flexible because it allows the user to choose any file instead of hardcoding the file name inside the program.
2. Initializing the Counter
counter = 0
A variable named counter is created and initialized with the value 0. This variable will keep track of how many lines in the file start with the word “From ”.
Counters are commonly used in programming when we need to count occurrences of something, such as lines, words, or numbers.
3. Opening the File
fh = open(fname)
The open() function is used to open the file specified by the user. The file handle is stored in the variable fh.
A file handle allows the program to read data from the file. Once the file is opened successfully, the program can process its contents line by line.
If the file does not exist, Python will produce an error. In advanced programs, we usually use exception handling (try/except) to avoid such errors.
4. Reading the File Line by Line
for line in fh:
This line creates a loop that reads the file one line at a time. Instead of loading the entire file into memory, Python processes it line by line. This approach is efficient when working with large files.
Each iteration of the loop stores a single line from the file in the variable line.
5. Removing Extra Whitespace
line = line.rstrip()
The rstrip() function removes whitespace characters (such as spaces or newline characters) from the end of the line.
Text files usually contain newline characters at the end of each line, and removing them helps ensure cleaner processing of the text.
6. Filtering Lines That Start with “From ”
if not line.startswith('From '): continue
This line checks whether the line starts with the text “From ” (note the space after the word).
The startswith() method returns True if the line begins with the specified string.
If the line does not start with “From ”, the continue statement tells the program to skip the current iteration and move to the next line in the loop.
This ensures that the program only processes lines that begin with “From ”.
Example of such a line in an email log file:
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
7. Splitting the Line into Words
words = line.split()
The split() function divides the line into a list of individual words using spaces as separators.
For example, the line:
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
will become the list:
['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']
Each word can be accessed using its index position.
8. Printing the Email Address
print(words[1])
In the list created by split(), the second element (index 1) contains the email address.
So the program prints the sender’s email address for each line that starts with “From ”.
Example output:
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
9. Increasing the Counter
counter += 1
Each time a valid line starting with “From ” is found, the counter is increased by 1.
This helps the program keep track of how many matching lines were processed.
10. Printing the Final Result
After the loop finishes processing the entire file, the program prints the total number of lines that started with “From ”.
print("There were", counter, "lines in the file with From as the first word")
Example output might look like this:
There were 27 lines in the file with From as the first word
Conclusion
This Python program demonstrates several important programming concepts such as file handling, loops, string processing, conditional statements, and list operations. It efficiently reads a file, filters specific lines, extracts email addresses, and counts how many relevant lines exist.
Such techniques are widely used in log analysis, email processing, data extraction, and text analytics. By understanding programs like this, beginners can learn how Python can handle real-world data stored in text files and transform it into meaningful information.
0 Comments