ansaurus

Question

output in two rows for multiple columns in python

Answer 1

A:

You can create a simple text file with "*.csv" extension. Separate each field (column) by a comma. Optionally, use quotation marks for text fields, especially if a field is expected to contain your delimiter (comma). You can even put excel formulas (preceded by '=') and excel will parse them correctly.

Double click on any csv file will open it in excel (unless your computer has other settings).

You can also use the csv module

The Learning Python book contains examples with more complex control (formatting, spreadsheets) using Windows COM components

EDIT: I have just seen this site. The PDF tutorial seems to be very detailed. Never used this.

bgbg 2009-10-15 17:07:20

Answer 2

A:

Here's one approach. I made the simplifying assumption that there is a small finite limit to the possible number of observations, so I just loop from 1 to 6 explicitly. You can easily expand the upper limit of the loop, although if you go past 9 the logic in the get_obs function will need to change. You could also write something more complex to first scan through all the data and get all the possible observation names, but I didn't want to put in that effort if it's not necessary.

This could be somewhat simplified if you used a dictionary instead of a list of tuples to hold the observation data for each row.

data = [[59000, 59500, 'chr1', 
    [('cn_04', '1.362352462'), ('cn_01', '1.802001235')]], 
    [100000, 110000, 'chr1', 
        [('cn_03', '1.887268908'), ('cn_02', '1.990457407'), ('cn_01', '4.302275763')]],
    [63500, 64000, 'chr1', 
        [('cn_03', '1.887268908'), ('cn_02', '1.990457407'), ('cn_01', '4.302275763')]]
  ]

def get_obs( num, obslist ):
  keyval = 'cn_0' + str(num)
  for obs in obslist:
    if obs[0] == keyval:
      return obs[1]
  return "."

for data_row in data:
  output_row = ""
  for obs in range(1,7):
    output_row += get_obs( obs, data_row[3] ) + '\t'
  output_row += str(data_row[0]) + '\t'
  output_row += str(data_row[1]) + '\t'
  output_row += str(data_row[2])
  print output_row

Dave Costa 2009-10-15 17:27:17

I love this answer! It looks beautiful, exactly what I needed. Thank you so much.

Jill Jo 2009-10-15 17:41:01

Answer 3

A:

Never do these types of nested lists/dictionary, they are not pythonic and are very likely to bring you to an error.

Instead, either use a class:

>>> class Gene:
       def __init__(self, start, end, chromosome, transcripts):
           self.start = start
           self.end = end
           self.chromosome = chromosome
           self.transcripts = transcripts
>>> gene1 = Gene(59000, 59500, 'chr1', [('cn_04', '1.362352462'), ('cn_01', '1.802001235')])
>>> gene2 = Gene(100000, 110000, 'chr1', [('cn_03', '1.887268908'), ('cn_02', '1.990457407'), ('cn_01', '4.302275763')])
>>> genes = [gene1, gene2, ...]
>>> gene1.start
59000
>>> genes[1].start
59000

or either use numpy's recordarrays and matrixes.

To read and write CSV file you can use numpy's recarrays and functions.

>>> from matplotlib.mlab import csv2rec, rec2csv
>>> import numpy as np
>>> d = array([(0, 10, 'chr1', [1, 2]), (20, 30, 'chr2', [1,2])], dtype=[('start', int), ('end', int), ('chromosome', 'S8'), ('transcripts', list)])

# all values in the 'chromosome' column
>>> d['chromosome']
array(['chr1', 'chr2'], 
      dtype='|S8')

# records in which chromosome == 1
>>> d[d['chromosome'] == 'chr1']   

# print first record
>>> d[0]
(0, 10, 'chr1', [1, 2])

# save it to a csv file:
>>> rec2csv(d, 'csvfile.txt', delimiter='\t')

dalloliogm 2009-10-15 17:27:30

Your initial comment is nonsense. How are nested lists 'not Pythonic'? How is using a third-party library like numpy more Pythonic than using Python's built-in features?

Daniel Roseman 2009-10-15 18:00:09

I said that because I know what the user wanted to ask and why. A few years ago I was in the same situation and I can tell you that it is using the wrong approach.In any case, the standard way to read and write CSV files is with the csv module, or with numpy's recarrays which are an extension of that.Using list of lists that way is not pythonic, is more perlist, because in python you have better data structures to handle these situations and you also have objects.

dalloliogm 2009-10-16 08:22:09

Answer 4

+2 A:

For sending data to Excel, I would use CSV instead of a fixed-length text format; that way, if it turns out (say) that you need more significant figures in your float values, the format of your output doesn't change. Also, you can just open CSV files in Excel; you don't have to import them. And the csv.writer deals with all of the data-type conversion issues for you.

I'd also take advantage of the (apparent) fact that the 4th item in each observation appears to be a set of key/value pairs, which the dict function can turn into a dictionary. Assuming that you know what all of the keys are, you can specify the order that you want them to appear in your output simply by putting them in a list (called keys in the below code). Then it's simple to create an ordered list of values with a list comprehension. Thus:

>>> import sys
>>> import csv
>>> keys = ['cn_01', 'cn_02', 'cn_03', 'cn_04', 'cn_05', 'cn_06']
>>> data = [[59000, 59500, 'chr1', [('cn_04', '1.362352462'), ('cn_01', '1.802001235')]], [100000,   110000, 'chr1', [('cn_03', '1.887268908'), ('cn_02', '1.990457407'), ('cn_01', '4.302275763')]], [63500, 64000, 'chr1', [('cn_03', '1.887268908'), ('cn_02', '1.990457407'), ('cn_01', '4.302275763')]]]
>>> writer = csv.writer(sys.stdout)
>>> writer.writerow(keys + ['start', 'stop', 'chromosome'])
cn_01,cn_02,cn_03,cn_04,cn_05,cn_06,start,stop,chromosome
>>>>for obs in data:
        d = dict(obs[3])
        row = [d.get(k, None) for k in keys] + obs[0:3]
        writer.writerow(row)

1.802001235,,,1.362352462,,,59000,59500,chr1
4.302275763,1.990457407,1.887268908,,,,100000,110000,chr1
4.302275763,1.990457407,1.887268908,,,,63500,64000,chr1

The above writes the data to sys.stdout; to create a real CSV file you'd do something like:

with open('file.csv', 'w') as f:
    writer = csv.writer(f)
    # now use the writer to write out the data

Robert Rossney 2009-10-15 19:06:11

Answer 5

A:

You can also use xlwt to write .xls files directly, without touching Excel. More info.

Here is some sample code to get you started (far from perfect):

import xlwt as xl
def list2xls(data, fn=None, col_names=None, row_names=None):
        wb = xl.Workbook()
        ws = wb.add_sheet('output')
        if col_names:
            _write_1d_list_horz(ws, 0, 1, col_names)
        if row_names:
            _write_1d_list_vert(ws, 1, 0, row_names)
        _write_matrix(ws, 1, 1, data)
        if not fn:
            fn = 'test.xls'
        wb.save(fn)
    def _write_matrix(ws, row_start, col_start, mat):
        for irow, row in enumerate(mat):
            _write_1d_list_horz(ws, irow + row_start, col_start, row)
    def _write_1d_list_horz(ws, row, col, list):
        for i, val in enumerate(list):
            ws.write(row, i + col, val)
    def _write_1d_list_vert(ws, row, col, list):
        for i, val in enumerate(list):
            ws.write(row + i, col, val)

Call list2xls, with data as a 2-d list, and optional column and row names as lists.

nazca 2009-10-16 06:31:44

ansaurus

tags:

views:

answers:

output in two rows for multiple columns in python

related questions