views:

173

answers:

3

Hi All,

I have a file in tab delimited format with trailing newline characters, e.g.,

123   abc
456   def
789   ghi

I wish to write function to convert the contents of the file into a nested list. To date I have tried:

def ls_platform_ann():
    keyword = []
    for line in open( "file", "r" ).readlines():
        for value in line.split():
            keyword.append(value)

and

def nested_list_input():
    nested_list = []
    for line in open("file", "r").readlines():
        for entry in line.strip().split():
            nested_list.append(entry)
            print nested_list

.

The former creates a nested list but includes \n and \t characters. The latter does not make a nested list but rather lots of equivalent lists without \n and \t characters.

Anyone help?

Regards, S ;-)

+3  A: 

First off, have a look at the csv module, it should handle the whitespace for you. You may also want to call strip() on value/entry.

Dana the Sane
+8  A: 

You want the csv module.

import csv

source = "123\tabc\n456\tdef\n789\tghi"
lines = source.split("\n")

reader = csv.reader(lines, delimiter='\t')

print [word for word in [row for row in reader]]

Output:

[['123', 'abc'], ['456', 'def'], ['789', 'ghi']]

In the code above Ive put the content of the file right in there for easy testing. If youre reading from a file from disk you can do this as well (which might be considered cleaner):

import csv

reader = csv.reader(open("source.csv"), delimiter='\t')

print [word for word in [row for row in reader]]
mizipzor
+2  A: 

Another option that doesn't involve the csv module is:

data = [[item.strip() for item in line.rstrip('\r\n').split('\t')] for line in open('input.txt')]

As a multiple line statement it would look like this:

data = []
for line in open('input.txt'):
    items = line.rstrip('\r\n').split('\t')   # strip new-line characters and split on column delimiter
    items = [item.strip() for item in items]  # strip extra whitespace off data items
    data.append(items)
tgray