Python Program to Analyze Email Distribution by Hour (mbox-short.txt Solution)

✅ Python Program to Find Email Distribution by Hour (mbox-short.txt)

10.2 Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon. From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008 Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
Python program to calculate email distribution by hour using mbox-short.txt file handling example
Python example showing how to analyze email timestamps and calculate hourly distribution using dictionaries.


📘 Problem Statement

Write a Python program to read through the mbox-short.txt file and determine the distribution of emails by hour of the day.

Each email message contains a line starting with From that includes a timestamp like:

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

Your task is to:

  • Extract the hour from the time (09:14:16)
  • Count how many emails were sent during each hour
  • Display results sorted by hour

🧠 Logic Behind the Program

  1. Open the mailbox file.
  2. Read each line.
  3. Select only lines starting with "From " (with space).
  4. Extract the time field.
  5. Split the time using : to get the hour.
  6. Store counts using a dictionary.
  7. Sort and print results.

💻 Optimized Python Program

name = input("Enter file: ")
if len(name) < 1:
name = "mbox-short.txt"


handle = open(name)

counts = {}

for line in handle:
if line.startswith("From "):
words = line.split()
time = words[5]
hour = time.split(":")[0]
counts[hour] = counts.get(hour, 0) + 1

# Sort by hour
result = sorted(counts.items())

for hour, count in result
:
print(hour, count)

📊 Example Output

04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1


🔍 Code Explanation

✔ Dictionary Counting

counts.get(hour, 0) + 1
  • If hour exists → increase count
  • If not → start from 0

✔ Extracting Hour

time.split(":")[0]

Converts:

09:14:16 → 09

✔ Sorting Output

sorted(counts.items())

Sorts results by hour automatically.


⚠️ Common Mistakes (Avoid These)

❌ Using From: instead of From
❌ Forgetting space after From
❌ Naming variable list (overwrites Python built-in function)


Post a Comment

0 Comments