教程 > scrapy 教程 > 阅读:20

scrapy 使用item——迹忆客-ag捕鱼王app官网

item 对象是 python 的常规字典。 我们可以使用下面的语法来访问类的属性

>>> item = dmozitem()
>>> item['title'] = 'sample title'
>>> item['title']
'sample title'

将上面的代码添加到下面的例子中

import scrapy
from tutorial.items import dmozitem
class myprojectspider(scrapy.spider):
   name = "project"
   allowed_domains = ["dmoz.org"]
   
   start_urls = [
      "http://www.dmoz.org/computers/programming/languages/python/books/",
      "http://www.dmoz.org/computers/programming/languages/python/resources/"
   ]
   def parse(self, response):
      for sel in response.xpath('//ul/li'):
         item = dmozitem()
         item['title'] = sel.xpath('a/text()').extract()
         item['link'] = sel.xpath('a/@href').extract()
         item['desc'] = sel.xpath('text()').extract()
         yield item

上述蜘蛛的输出将是

[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by david mertz; addison wesley. book in progress, full text, 
      ascii format. asks for feedback. [author website, gnosis software, inc.\n],
   'link': [u'http://gnosis.cx/tpip/'],
   'title': [u'text processing in python']}
[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by sean mcgrath; prentice hall ptr, 2000, isbn 0130211192, 
      has cd-rom. methods to build xml applications fast, python tutorial, dom and 
      sax, new pyxie open source xml processing library. [prentice hall ptr]\n'],
   'link': [u'http://www.informit.com/store/product.aspx?isbn=0130211192'],
   'title': [u'xml processing with python']}

查看笔记

扫码一下
查看教程更方便
网站地图