Easiest way to crawl data out from a web page in Python


Tried & Tested

This post tells you how to crawl / extract / scrape data from a web page in an easy and proven way....

Data Claw Machine

Data Claw Machine

Let's say we want to stay updated to all the latest news of EPL. But we don't have time to read through all the articles. So, we think of getting the titles of the featured news in the EPL web page.


Step-by-Step Guide


1) Launch Anaconda Navigator 


2) Launch Jupyter 

Launch "Jupyter" in Anaconda Navigator
Launch "Jupyter" in Anaconda Navigator


3) Create a new Python notebook

Create new Python notebook
Create new Python notebook


4) Import all the libraries

Import libraries
Import libraries


5) Get the URL of EPL web page

Get URL
Get URL


6) Use BeautifulSoup to parse the web page

Make soup to parse the web page
Make soup to parse the web page


7) Check the HTML structure of the web page

Check the HTML  structure



8) Use BeautifulSoup to crawl all the "Featured Articles" 
  • find_all will look into every section in the HTML script that has the class "featuredArticle" as shown in the picture above


9) Crawl all the titles of the featured articles 

Get title
Get title 


10) Last step: Print results!


  • text is used to get the data only so there will be no tags like </span>

Print results
Print results


That's all! We hope you have learnt something useful.

Comments

Popular posts from this blog

How to connect Python to MySQL Workbench

Predict EPL results (Part 2: Neural Network example)

Bias and Variance in Machine Learning