daomubiji.com(盗墓笔记com是官方的吗)满满干货
按这个目录结构抓取。items,settings,middlewares正常配置。spider。pipelines。
按这个目录结构抓取
items,settings,middlewares正常配置spiderimport scrapy import os classDmbjSpider(scrapy.Spider): name =
dmbj allowed_domains = [www.daomubiji.com] defstart_requests(self): start_url =
for i in range(1, 12): if i < 9: start_url = http://www.daomubiji.com/dao-mu-bi-ji-{}
.format(i) elif i == 9: start_url = http://www.daomubiji.com/dao-mu-bi-ji-2015
elif i == 10: start_url = http://www.daomubiji.com/sha-haielif i == 11: start_url =
http://www.daomubiji.com/zang-hai-huayield scrapy.Request(start_url, callback=self.list_parse) def
list_parse(self, response): list_urls = response.xpath(//article[@class="excerpt excerpt-c3"]/a/@href
) for url in list_urls: item = {} # item要在循环内定义,否则会被覆盖为最后一个url detail_url = url.get() item[
url] = detail_url ifqi-xing-lu-wangin item[url]: item[path] = 盗墓笔记/七星鲁王/elif
nu-hai-qian-shain item[url]: item[path] = 盗墓笔记/怒海潜沙/elifqin-ling-shen-shuin item[url]: item[
path] = 盗墓笔记/秦岭神树/elifyun-ding-tian-gongin item[url]: item[path] = 盗墓笔记/云顶天宫/elifshe-zhao-gui-cheng
in item[url]: item[path] = 盗墓笔记/蛇沼鬼城/elifmi-hai-gui-chaoin item[url]: item[
path] = 盗墓笔记/谜海归巢/elif2-yin-ziin item[url]: item[path] = 盗墓笔记/第二季/引子/elifyin-shan-gu-lou
in item[url]: item[path] = 盗墓笔记/第二季/阴山古楼/elifqiong-long-shi-yingin item[url]: item[
path] = 盗墓笔记/第二季/邛笼石影/elifdao-mu-bi-ji-7in item[url]: item[path] = 盗墓笔记/第二季/盗墓笔记7/elif
dajiejuin item[url]: item[path] = 盗墓笔记/第二季/大结局/elif2015in item[url]: item[
path] = 盗墓笔记/2015年更新/elifshahaiin item[url]: item[path] = 盗墓笔记/沙海/elifzang-hai-huain item[
url]: item[path] = 盗墓笔记/藏海花/else: print(这个网页没找到路径:, item[url])
ifnot os.path.exists(item[path]): os.makedirs(item[path]) yield scrapy.Request(detail_url, meta={
item:item},callback=self.parse) defparse(self, response, **kwargs): item = response.meta[
item] item[name] = response.xpath(//h1/text()).get().replace(?, ) contents = response.xpath(
//article//text()) content = for i in contents: content += i.get().strip().replace(
\\u3000, ) + \n item[content] = content yield itempipelinesclassDaomuPipeline:defprocess_item
(self, item, spider): file_name = item[name] + .txtwith open(item[path] + file_name, w, encoding=
utf-8) as f: f.write(item[content]) print(file_name + --> 保存到 /{} --> 成功!.format(item[
path])) return item
- 标签:
- 编辑:李松一
- 相关文章
-
鲁大师 360(鲁大师下载安装)这都可以?
活动介绍:应用宝下载“放假周边游”APP,返回领取10M-100M的流量,三网通活动时间:结束未知活动步骤:…
-
狗民网(狗民网下载)硬核推荐
最激动人心的事情,是能够从每一个产品的小细节里发现了增长的大秘密。
- 狗民网(狗民网下载)速看
- 微银行(微银行收不到交易信息)墙裂推荐
- 萧山透明售房网(萧山房产网)学到了
- 嘀哩嘀哩网站(嘀哩嘀哩网站网页版入口)速看
- 可乐电影网(可乐电影网第一集在线观看)新鲜出炉