
![]() |
@wtf | |
(Extract titles from HTML) |
||
1
Replies
7
Views
1 Bookmarks
|
![]() |
@wtf | 4 days |
*Web Scr*ping (Mocked) in Python* *What is Web Scr*ping?* Web scr*ping means writing code to fetch and extract data from web pages. We usually use: - requests → to download the HTML - BeautifulSoup → to parse and extract data from HTML *Step-by-Step Mock Scr*ping Example* Imagine this is our webpage (stored as a string): html = html body h1Top Programming Languages/h1 ul liPython/li liJavaScript/li liJava/li /ul /body /html *Parse It Using BeautifulSoup* from bs4 import BeautifulSoup soup = BeautifulSoup(html, html.parser) Extracting the heading heading = soup.find(h1).text print(Heading:, heading) Extracting the list items languages = soup.find_all(li) for lang in languages: print(Language:, lang.text) *Output* : Heading: Top Programming Languages Language: Python Language: JavaScript Language: Java *Why Use Web Scr*ping?* - Collect product prices from e-commerce sites - Grab job listings from job boards - Gather headlines or articles from news sites - Monitor stock prices or crypto values (Always follow website terms of service before scr*ping real websites) *Mini Task: Extract Blog Titles* Here’s another HTML string you can experiment with: html = div h2Python for Beginners/h2 h2Data Analysis with Pandas/h2 h2Machine Learning Crash Course/h2 /div soup = BeautifulSoup(html, html.parser) titles = soup.find_all(h2) for t in titles: print(Blog Title:, t.text) *What You Practiced:* - Parsing HTML with BeautifulSoup - Using find() and find_all() - Navigating a page's structure - Extracting content programmatically |
||


