Easiest way to crawl data out from a web page in Python

April 03, 2019

Tried & Tested

This post tells you how to crawl / extract / scrape data from a web page in an easy and proven way....

Data Claw Machine

Let's say we want to stay updated to all the latest news of EPL. But we don't have time to read through all the articles. So, we think of getting the titles of the featured news in the EPL web page.

Step-by-Step Guide

1) Launch Anaconda Navigator

2) Launch Jupyter

Launch "Jupyter" in Anaconda Navigator

3) Create a new Python notebook

Create new Python notebook

4) Import all the libraries

Import libraries

5) Get the URL of EPL web page

Get URL

6) Use BeautifulSoup to parse the web page

Make soup to parse the web page

7) Check the HTML structure of the web page

Check the HTML structure

8) Use BeautifulSoup to crawl all the "Featured Articles"

find_all will look into every section in the HTML script that has the class "featuredArticle" as shown in the picture above

9) Crawl all the titles of the featured articles

Get title

10) Last step: Print results!

text is used to get the data only so there will be no tags like </span>

Print results

That's all! We hope you have learnt something useful.

Search This Blog

Gooey confusion