tags:

views:

99

answers:

3

here's the code!

import csv

def do_work():
      global data
      global b
      get_file()
      samples_subset1()
      return

def get_file():

      start_file='thefile.csv'

      with open(start_file, 'rb') as f:
        data = list(csv.reader(f))
        import collections
        counter = collections.defaultdict(int)

      for row in data:
        counter[row[10]] += 1
      return

def samples_subset1():

      with open('/pythonwork/samples_subset1.csv', 'wb') as outfile:
          writer = csv.writer(outfile)
          sample_cutoff=5000
          b_counter=0
          global b
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 global b
                 b.append(row) 
                 writer.writerow(row)
                 #print b[b_counter]
                 b_counter+=1
      return

i am a beginner at python. the way my code runs is i call do_work and do_Work will call the other functions. here are my questions:

  1. if i need datato be seen by only 2 functions should i make it global? if not then how should i call samples_subset1? should i call it from get_file or from do_work?

  2. the code works but can you please point other good/bad things about the way it is written?

  3. i am processing a csv file and there are multiple steps. i am breaking down the steps into different functions like get_file, samples_subset1, and there are more that i will add. should i continue to do it the way i am doing it right now here i call each individual function from do_work?

here is the new code, according to one of the answers below:

import csv
import collections

def do_work():
      global b
      (data,counter)=get_file('thefile.csv')
      samples_subset1(data, counter,'/pythonwork/samples_subset1.csv')
      return

def get_file(start_file):

        with open(start_file, 'rb') as f:
        global data
        data = list(csv.reader(f))
        counter = collections.defaultdict(int)

      for row in data:
        counter[row[10]] += 1
      return (data,counter)

def samples_subset1(data,counter,output_file):

      with open(output_file, 'wb') as outfile:
          writer = csv.writer(outfile)
          sample_cutoff=5000
          b_counter=0
          global b
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 global b
                 b.append(row) 
                 writer.writerow(row)
                 #print b[b_counter]
                 b_counter+=1
      return
+5  A: 

As a rule of thumb, avoid global variables.

Here, it's easy: let get_file return data then you can say

data = get_file()
samples_subset1(data)

Also, I'd do all the imports on the top of the file

Nicolas78
thank you very very much
I__
In order for this not to sound like "do it like this!" - imagine you want to import several files. Or apply several datasets to your processing function. Then you want to be able to tell the function what it's supposed to work on - this would become ugly with the global variable approach. I don't think I ever find a need for global variables in my code that's not based on my personal laziness ;)
Nicolas78
oh and make start_file a parameter for get_file, while you're at it :)
Nicolas78
oh and finally, I'd assume sample_subset cannot see counter and should complain? Even if it doesn't, you might want to return it as well and pass it to samples_subset1. This is easily done in Python by returning (var1, var2) and then saying (data, counter) = get_file()
Nicolas78
thank you very much, i am doing all the changes you suggest, please hold!!
I__
nicolas, ive updated my question, can u please take a look at the revised code and advise
I__
yea - just erase every line which contains global and you're done :) (it won't change anything as far as I can tell - but then I don't know what you're up to with b. but again, if you need it later on, return it from samples_subset1)
Nicolas78
I *have* come across uses for global "variables" that aren't, but they're rare. Stuff like the value of pi, or something like `HOURLY_RATE`. Other than that I'm not sure I've ever come across a `good` use for global variables (meaning non-one-off script, etc)
Wayne Werner
+3  A: 

if you must use a global (and sometimes we must) you can define it in a Pythonic way and give only certain modules access to it without the nasty global keyword at the top of all of your functions/classes.

Create a new module containing only global data (in your case let's say csvGlobals.py):

# create an instance of some data you want to share across modules
data=[]

and then each file you want to have access to this data can do so in this fashion:

import csvGlobals

csvGlobals.data = [1,2,3,4]
for i in csvGlobals.data:
    print i
dls
true. the file should be called cfg.py then
Nicolas78
good catch nicolas78! updated.
dls
+2  A: 

If you want to share data between two or more functions then it is generally better to use a class and turn the functions into methods and the global variable into attributes on the class instance.

BTW, you do not need the return statement at the end of every function. You only need to explicitly return if you want to either return a value or to return in the middle of the function.

Dave Kirby
thank you very much. this is helpful. can you please show demonstrate to me how i would use a class with this code?
I__