Scraping HTML Data with BeautifulSoup

Scraping Numbers from HTML using BeautifulSoup In this assignment you will write a Python program similar to http://www.py4e.com/code3/urllink2.py. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.

We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.

Sample data: http://py4e-data.dr-chuck.net/comments_42.html (Sum=2553)
Actual data: http://py4e-data.dr-chuck.net/comments_558261.html (Sum ends with 74)

You do not need to save these files to your folder since your program will read the data directly from the URL.

Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis.

SOLUTION

import urllib.request

from bs4 import BeautifulSoup

import urllib

from urllib.request import urlopen

import re

from bs4 import BeautifulSoup

urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_676725.html').read()

soup = BeautifulSoup(html, "html.parser")

sum=0

# Retrieve all of the anchor tags

tags = soup('span')

for tag in tags:

# Look at the parts of a tag

y=str(tag)

x= re.findall("[0-9]+",y)

for i in x:

i=int(i)

sum=sum+i

print(sum)

QueueOverFlows

Scraping HTML Data with BeautifulSoup

Post a Comment

0 Comments

Labels

Contact Form

Random Posts

Recent in Sports

Popular Posts

Extracting Data With Regular Expressions In Python

Write a program for Bayes classification algorithm using Matlab

Extracting Data from XML In Python

Menu Footer Widget

QueueOverFlows

Scraping HTML Data with BeautifulSoup

You may like these posts

Post a Comment

0 Comments

Social Plugin

Labels

Contact Form

Random Posts

Recent in Sports

Popular Posts

Extracting Data With Regular Expressions In Python

Write a program for Bayes classification algorithm using Matlab

Extracting Data from XML In Python

Menu Footer Widget