The new york times bestseller

字数统计: 243阅读时长: 1 min

 2019/02/21   Share







-- coding: utf-8 --

import urllib.request
from bs4 import BeautifulSoup
import xlwt

url，伪装

url = “https://book.douban.com/"
headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36’}
request = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(request)
page_source = response.read().decode(‘utf-8’)

bs解析

soup = BeautifulSoup(page_source, ‘html.parser’)

爬取

links = soup.find_all(‘a’,class_=”name”,target=”_blank”)

存储

lists = []
for link in links:
lists.append(link.string)

输出

number = 0
for i in lists:
if(number == 0):
print(end = “京东：”)
print(end = “%s” %i)
print(end = “ “)
number += 1
if(number == 10):
print()
print(end = “当当：”)

创建一个workbook 设置编码

workbook = xlwt.Workbook(encoding = ‘utf-8’)

创建一个worksheet

worksheet = workbook.add_sheet(‘The New York Times Bestseller’)

写入excel

参数对应行, 列, 值

for i in range(1,11):
worksheet.write(0, i, label=’第%d名’ % i)
worksheet.write(1,0, label = ‘京东’)
worksheet.write(2,0, label = ‘当当’)

计数，行，列

number = 0
row = 1
column = 1
for i in lists:
number += 1
if number == 11:
row = 2
column = 1
worksheet.write(row, column, label = i)
column += 1

设置列间隔

for i in range(1,11):
worksheet.col(i).width = 256*23

保存

workbook.save(‘The New York Times Bestseller.xls’)

原文作者：F_JustWei

原文链接：http://139.196.79.90/2019/02/21/The-new-york-times-bestseller/

发表日期：February 21st 2019, 3:26:47 pm

更新日期：February 24th 2019, 9:52:09 am

版权声明：本文采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可

Next Post

Crawl all the books corresponding to the label

CATALOG

1. -- coding: utf-8 --
2. url，伪装
3. bs解析
4. 爬取
5. 存储
6. 输出
7. 创建一个workbook 设置编码
8. 创建一个worksheet
9. 写入excel
10. 参数对应行, 列, 值
11. 计数，行，列
12. 设置列间隔
13. 保存

