Record#
- Day 95 is to retrieve 5 news articles from "news", submit them to openai to generate keywords, and then send the keywords to spotipy to return songs. Due to the relationship with openai, today's practice is skipped.
- Day 96 is to learn how to retrieve and parse HTML content, and finally learn the most powerful feature of Python: web scraping!
- Use
response = requests.get(url)
andhtml = response.text
to retrieve the HTML content of a webpage. - Use
soup = BeautifulSoup(html, 'html.parser')
to format the HTML. Before that, import the library:from bs4 import BeautifulSoup
. - Use
soup.find_all("span", {"class", "titleline"})
to retrieve specific content.span
is the tag name, followed by the class and class name. - Today's practice is to retrieve the titles from hacker news, and if they contain "python" and "replit", print them. During the process, it was found that there were no titles containing these two keywords, so another keyword, "SQL", was added.
CODE#
main.py#
from bs4 import BeautifulSoup
import requests
url = "https://news.ycombinator.com"
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
title = soup.find_all("span", {"class", "titleline"})
print(len(title))
for txt in title:
if "python" in txt.text or "replit" in txt.text or "SQL" in txt.text:
print(txt.text)
Translation: