views:

327

answers:

3

Hello,

I am studying the Scrapy tutorial. To test the process I created a new project with these files:

See my post in Scrapy group for links to scripts, I cannot post more than 1 link here.

The spider runs well and scrapes the text between title tags and puts it in FirmItem

[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner '])

But I am stuck in the pipeline process. I want to add this FirmItem into a csv file so that I can add it to the database.

I am new to python and I am learning as I go along. I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv.

Thank you.

A: 

Open file and write to it.

f = open('my.cvs','w')
f.write('h1\th2\th3\n')
f.write(my_class.v1+'\t'+my_class.v2+'\t'+my_class.v3+'\n')
f.close()

Or output your results on stdout and then redirect stdout to file ./my_script.py >> res.txt

Elalfer
This didn't work for me: >>> f = open('my.csv', 'rw')Traceback (most recent call last): File "<pyshell#63>", line 1, in <module> f = open('my.csv', 'rw')IOError: [Errno 22] invalid mode ('rw') or filename: 'my.csv'
Zeynel
ohh, sry. Maybe mode is wrong, use 'w'
Elalfer
A: 

Python has a module for reading/writing CSV files, this is safer than writing the output yourself (and getting all quoting/escaping right...)

import csv
csvfile = csv.writer(open('items.csv', 'w'))
csvfile.writerow([ firmitem.title, firmitem.url ])
csvfile.close()
Wim
Thanks, but I don't know where FirmItem is. That's why I was trying to use the pipeline.py.(Sorry, I am putting 4 spaces before lines but it's not formating properly) >>> csvfile = csv.writer(open('items.csv', 'w')) >>> csvfile.writerow([ firmitem.title, firmitem.url]) Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> csvfile.writerow([ firmitem.title, firmitem.url]) NameError: name 'firmitem' is not defined
Zeynel
`firmitem` is your object with data
Elalfer
I am sorry, this is what I don't understand. How do I get FirmItem (I assume it should match the [class name](http://dpaste.org/fiSg/)?) When I run the code in IDLE I get "NameError: name 'FirmItem' is not defined"
Zeynel
No FirmItem is the class name. The variable name isn't printed in the line you showed in your question. But I think leeo has the answer.
Wim
+1  A: 

I think they address your specific question in the Scrapy Tutorial.

It suggest, as others have here using the CSV module. Place the following in your pipelines.py file.

import csv

class CsvWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'wb'))

    def process_item(self, domain, item):
        self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
        return item

Don’t forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this:

ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']

Adjust to suit the specifics of your project.

leeo
Yes, thanks. That fixed it.
Zeynel