扫码一下
查看教程更方便
item 对象是 python 的常规字典。 我们可以使用下面的语法来访问类的属性
>>> item = dmozitem()
>>> item['title'] = 'sample title'
>>> item['title']
'sample title'
将上面的代码添加到下面的例子中
import scrapy
from tutorial.items import dmozitem
class myprojectspider(scrapy.spider):
name = "project"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/computers/programming/languages/python/books/",
"http://www.dmoz.org/computers/programming/languages/python/resources/"
]
def parse(self, response):
for sel in response.xpath('//ul/li'):
item = dmozitem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item
上述蜘蛛的输出将是
[scrapy] debug: scraped from <200
http://www.dmoz.org/computers/programming/languages/python/books/>
{'desc': [u' - by david mertz; addison wesley. book in progress, full text,
ascii format. asks for feedback. [author website, gnosis software, inc.\n],
'link': [u'http://gnosis.cx/tpip/'],
'title': [u'text processing in python']}
[scrapy] debug: scraped from <200
http://www.dmoz.org/computers/programming/languages/python/books/>
{'desc': [u' - by sean mcgrath; prentice hall ptr, 2000, isbn 0130211192,
has cd-rom. methods to build xml applications fast, python tutorial, dom and
sax, new pyxie open source xml processing library. [prentice hall ptr]\n'],
'link': [u'http://www.informit.com/store/product.aspx?isbn=0130211192'],
'title': [u'xml processing with python']}