二手产品经理

二手产品经理

THIS IS RENO

Scraping - 95~96 days - Learn Python online for 100 days.

Record#

  1. Day 95 is to retrieve 5 news articles from "news", submit them to openai to generate keywords, and then send the keywords to spotipy to return songs. Due to the relationship with openai, today's practice is skipped.
  2. Day 96 is to learn how to retrieve and parse HTML content, and finally learn the most powerful feature of Python: web scraping!
  3. Use response = requests.get(url) and html = response.text to retrieve the HTML content of a webpage.
  4. Use soup = BeautifulSoup(html, 'html.parser') to format the HTML. Before that, import the library: from bs4 import BeautifulSoup.
  5. Use soup.find_all("span", {"class", "titleline"}) to retrieve specific content. span is the tag name, followed by the class and class name.
  6. Today's practice is to retrieve the titles from hacker news, and if they contain "python" and "replit", print them. During the process, it was found that there were no titles containing these two keywords, so another keyword, "SQL", was added.

CODE#

main.py#

from bs4 import BeautifulSoup
import requests

url = "https://news.ycombinator.com"

response = requests.get(url)
html = response.text

soup = BeautifulSoup(html, 'html.parser')
title = soup.find_all("span", {"class", "titleline"})
print(len(title))

for txt in title:
  if "python" in txt.text or "replit" in txt.text or "SQL" in txt.text:
    print(txt.text)

Translation:

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.