views:

87

answers:

4
+1  Q: 

lists and sublists

i use this code to split a data to make a list with three sublists. to split when there is * or -. but it also reads the the \n\n *.. dont know why? i dont want to read those? can some one tell me what im doing wrong? this is the data

*Quote of the Day -Education is the ability to listen to almost anything without losing your temper or your self-confidence - Robert Frost -Education is what survives when what has been learned has been forgotten - B. F. Skinner *Fact of the Day -Fractals, an important part of chaos theory, are very useful in studying a huge amount of areas. They are present throughout nature, and so can be used to help predict many things in nature. They can also help simulate nature, as in graphics design for movies (animating clouds etc), or predict the actions of nature. -According to a recent survey by Just-Eat, not everyone in The United Kingdom actually knows what the Scottish delicacy, haggis is. Of the 1,623 British people polled:\n\n * 18% of Brits thought haggis was some sort of Scottish animal.\n\n * 15% thought it was a Scottish musical instrument.\n\n * 4% thought it was a character from Harry Potter.\n\n * 41% didn't even know what Scotland's national dish was.\n\nWhile a small number of Scots admitted not knowing what haggis was either, they also discovered that 68% of Scots would like to see Haggis delivered as takeaway. -With the growing concerns involving Facebook and its ever changing privacy settings, a few software developers have now engineered a website that allows users to trawl through the status updates of anyone who does not have the correct privacy settings to prevent it.\n\nNamed Openbook, the ultimate aim of the site is to further expose the problems with Facebook and its privacy settings to the general public, and show people just how easy it is to access this type of information about complete strangers. The site works as a search engine so it is easy to search terms such as 'don't tell anyone' or 'I hate my boss', and searches can also be narrowed down by gender. *Pet of the Day -Scottish Terrier -Land Shark -Hamster -Tse Tse Fly END

i use this code:

contents = open("data.dat").read()
data = contents.split('*') #split the data at the '*'

newlist = [item.split("-") for item in data if item]

to make that wrong similar to what i have to get list

+2  A: 

The "\n\n" is part of the input data, so it's preserved in python. Just add a strip() to remove it:

finallist = [item.strip() for item in newlist]

See the strip() docs: http://docs.python.org/library/stdtypes.html#str.strip

UPDATED FROM COMMENT:

finallist = [item.replace("\\n", "\n").strip() for item in newlist]
sunetos
nope, `strip()` won't do it! i also got confused at first but look closely - in the file there are sequences of '\' and 'n':`British people polled:\n\n * 18% of Brits`
Nas Banov
If for some reason your input data is escaped oddly and you actually have '\' followed by 'n', then just do: finallist = [item.replace("\\n", "\n").strip() for item in newlist]
sunetos
+1  A: 

open("data.dat").read() - reads all symbols in file, not only those you want. If you don't need '\n' you can try content.replace("\n",""), or read lines (not whole content), and truncate the last symbol'\n' of each line.

Max
A: 

This is going to split any asterisk you have in the text as well.

Better implementation would be to do something like:

lines = []

for line in open("data.dat"):
    if line.lstrip.startswith("*"):
        lines.append([line.strip()])  # append a list with your line
    elif line.lstrip.startswith("-"):
        lines[-1].append(line.strip())

For more homework, research what's happening when you use the open() function in this way.

Sean Woods
A: 

The following solves your problem i believe:

result = [  [subitem.replace(r'\n\n', '\n') for subitem in item.split('\n-')]
            for item in open('data.txt').read().split('\n*')  ]

# now let's pretty print the result
for i in result:
    print '***', i[0], '***'
    for j in i[1:]:
        print '\t--', j
    print

Note I split on new-line + * or -, in this way it won't split on dashes inside the text. Also i replace the textual character sequence \ n \ n (r'\n\n') with a new line character '\n'. And the one-liner expression is list comprehension, a way to construct lists in one gulp, without multiple .append() or +

Nas Banov