【Python】Python爬虫代码举例 – 编程技术之美-IT之美

这里给出使用Python进行网页爬取的简单示例:

## python www.itzhimei.com 代码
import requests

## python www.itzhimei.com 代码
resp = requests.get('http://example.com')

## python www.itzhimei.com 代码
html = resp.text

## python www.itzhimei.com 代码
from bs4 import BeautifulSoup

## python www.itzhimei.com 代码
soup = BeautifulSoup(html, 'html.parser')

## python www.itzhimei.com 代码
h1 = soup.find('h1').text
links = [a['href'] for a in soup.find_all('a')]

## python www.itzhimei.com 代码
with open('data.txt', 'w') as f:
  f.write(h1)
  f.write(str(links))

主要工具包括Requests获取网页、Beautiful Soup解析内容、Selenium模拟浏览器等。需要遵守robots.txt规范,不要频繁请求。

使用Python 的爬虫框架可以方便地抓取网页数据,但要注意合法性和道德问题。