Tried & Tested
This post tells you how to crawl / extract / scrape data from a web page in an easy and proven way....
 |
Data Claw Machine |
Let's say we want to stay updated to all the latest news of EPL. But we don't have time to read through all the articles. So, we think of getting the titles of the featured news in the
EPL web page.
Step-by-Step Guide
1) Launch Anaconda Navigator
2) Launch Jupyter
 |
| Launch "Jupyter" in Anaconda Navigator |
3) Create a new Python notebook
 |
| Create new Python notebook |
4) Import all the libraries
 |
| Import libraries |
5) Get the URL of EPL web page
 |
| Get URL |
6) Use BeautifulSoup to parse the web page
 |
| Make soup to parse the web page |
7) Check the HTML structure of the web page
 |
Check the HTML structure
8) Use BeautifulSoup to crawl all the "Featured Articles"
- find_all will look into every section in the HTML script that has the class "featuredArticle" as shown in the picture above
|
9) Crawl all the titles of the featured articles
 |
| Get title |
10) Last step: Print results!
- text is used to get the data only so there will be no tags like </span>
 |
| Print results |
That's all! We hope you have learnt something useful.
Comments
Post a Comment