ansaurus

Question

Question on how to develop and then parse a data structure

Answer 1

A:

You can use sqlite , or any other kind of databases like Mysql, etc.. to store you data.

ghostdog74 2010-10-13 04:16:35

As I mentioned, for this program, which I intend to be light with no dependencies other than pygtk, I don't want to require sqlite databases nor mess with them. If I were storing hundreds, perhaps. But this is just for me and my extended family for scrapping weather data from various websites.

narnie 2010-10-13 04:21:21

sqlite comes with Python 2.5 onwards.

ghostdog74 2010-10-13 04:24:38

Answer 2

+2 A:

To store data, you may use XML. Then read it off using any XML parser, SAX or DOM which are included with python.

Since the size of data is very less (only around 20-25 entries per user), you can take the approach of first knowing about the search term, whether its state name or whether its pin code etc. (Ask user to enter this in GUI).

Assuming that data is stored as [City State Pin Mater], you can then search in the respective column. e.g if the user enters 12345, and you know it's a pin, you only need to search in the 3rd index of the list-data and then return the list. For state name you would search in the 2nd column.

And this approach works even if the number of records is large, say a couple of hundreds.

tushartyagi 2010-10-13 04:38:03

And by asking user to enter the info in GUI I mean for each category create a text box and the user searches according to that category. For more than one, use ANDing.

tushartyagi 2010-10-13 04:41:20

I had thought about XML. To be honest, I haven't found a tutorial on either SAX or DOM that makes me comfortable enough with it to use. I think it is just mental block though, as I have no probs with HTML parsers, etc. I will check into XML more. I know I'll need to learn it. XML seems to be taking over so much, so I think that is part of my roadblock. Things that could be simple text are being put in XML. Seems a waste a lot of times to me. Oh well, I'll just slap myself upside the back of the head and deal with it. :)

narnie 2010-10-14 18:12:49

Simple text turning into XML is an overkill. But for more complex data XML and JSON are really nice and useful. Learn it, you won't regret.

tushartyagi 2010-10-15 04:16:12

Answer 3

A:

I would not recommend using an ini-formatted file like you've posted.

I think you have two general options: * Use a database, such as sqlite as @ghostdog74 suggests * Use a flat file using a common, easily-parsable data structure

For flat file, XML or JSON is probably the best bet. There are built-in parsers to both in most languages now. JSON is a bit more efficient, but XML is a bit more readable and structured.

The downside to this of course is that you basically have to parse/read the entire file in order to do a lookup. The upside is it should be a trivial amount of code.

You are talking about creating your own indexes, effectively, so you can easily lookup by ZIP or metar, etc. I'd suggest it's unnecessary. If you REALLY wanted to, you could use a hash-table in memory to store references to the objects, but unless you're reading the file once and then doing multiple lookups, the overhead of building the hashtables will likely be way more than just looping through to find what you need. I get the impression this is a web app where you read the file, do one or two lookups, and spit out a web page.

If your data is at the point where the act of looping through all records is causing a noticeable performance impact, then you're into the territory where you should be using a real database, such as sqlite, mysql, postgresql, etc. Anything less, and you'll just be re-inventing what they've already done with indexing and file storage in a not-as-good way.

gregmac 2010-10-13 04:39:14

This good to know and think about. I'm not making a web app. This is just a GUI run locally. I have made a class I'm calling SuperDict that behaves somewhat like a dictionary with multiple keys, but I'm name making separate dictionaries for each key. It is working great. I'll post the code when I'm sure I don't want to add any more methods to it.

narnie 2010-10-14 19:00:39

Answer 4

+1 A:

In case of syntax you've proposed there is a ConfigParser module thus you can parse the file easily, getting all the strings grouped by sections or whatever you want. But later it will lead to other troubles - just think about massive updates to the file.

So I'd recommend to use sqlite too. It's a standard module and it's tiny, so no overhead or extra dependencies.

eGlyph 2010-10-14 12:50:33

Oh wow, I didn't know sqlite was a standard module. That is big news to me! Now you guys have given me a tough choice. Do XML or sqlite. Both great options, neither of which I'm facile with (YET!). I will also check into ConfigParser just for learning purposes. My thanks

narnie 2010-10-14 19:03:04

Answer 5

A:

As far as the data structure, I taught myself how to use xml.dom. It turns out, I really like it. I see now why programmers are using it more for config files, etc.

I decided to develop my own class that acts sort of like a dictionary but can have multiple keys. Unlike a dictionary, it can have more then one key with the same value (but I designed the key() method will return only unique values).

Here is the code:

#! /usr/bin/python2.6
# -*- coding: utf-8 -*-
'''makes a new dictionary-type class

This class allows for multiple keys unlike the dictionary type.
The keys are past first as a list or tuple. Then, the data is to 
be passed in a tuple or list of tuples or lists with the exact 
number of data items per list/tuple.

Example:

    >>> from superdict import SuperDict
    >>> keynames = ['fname', 'lname', 'street', 'city', 'state', 'zip']
    >>> names = [
    ...     ['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135'],
    ...     ['Rachel', 'West', '456 Bossom Rd', 'St Louis', 'MO', '62482']
    ...     ]
    >>> dictionary = SuperDict(keynames, names)

There is a SuperDict.keys method that shows all the keys for a given keyname
to be accessed as in:

    >>> print dictionary.keys('city')
    ['Topeka', 'St Louis']

The add method is used to pass a list/tuple but must have the same number of
'fields'.
    >>> dictionary.add(['John', 'Richards', '6 Tulip Ln', 'New Orleans', 'LA', '69231'])

The extend method is used to pass a multiple lists/tuples inside a list/tuple as above.

    >>> new_names = [
    ['Randy', 'Young', '54 Palm Tree Cr', 'Honolulu', 'HA', '98352'],
    ['Scott', 'People', '31932 5th Ave', 'New York', 'NY', '03152']
    ]
    >>> dictonary.extend(new_names)

The data attribute can be used to access the raw data as in:
    >>> dictionary.data
    [['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135'], ['Rachel', 'West', '456 Bossom Rd', 'St Louis', 'MO', '62482'], ['Randy', 'Young', '54 Palm Tree Cr', 'Honolulu', 'HA', '98352'], ['Scott', 'People', '31932 5th Ave', 'New York', 'NY', '03152']]

The data item is retrieved with the find method as below:

    >>> dictionary.find('city', 'Topeka')
    ['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135']

What if there are more than one? Use the extended options of find to
find a second field as in:

    >>> dictionary.find('city', 'Topeka', 'state', 'KS')
    ['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135']

To find all the fields that match, use findall (second keyname 
is available for this just as in the find method):

    >>> dictionary.find('city', 'Topeka')
    [['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135'], ['Ralph', 'Johnson', '513 Willow Way', 'Topeka', 'KS', '73189']]


The delete method allows one to remove data, if needed:
    >>> dictionary.delete(new_names[0])
    >>> dictionary.data
    [['Jim', 'Smith', '123 Cherry St', 'Topeka', 'KS', '73135'], ['Rachel', 'West', '456 Bossom Rd', 'St Louis', 'MO', '62482'], ['Scott', 'People', '31932 5th Ave', 'New York', 'NY', '03152']]

maintainer: <[email protected]>

LICENSE: GPL version 2
Copywrite 2010
'''

__version__ = 0.4

indexnames = ['city','state','zip','metar']
datasample = [
    ['Pawhuska', 'OK', '74056', 'KBVO'],
    ['Temple', 'TX', '76504', 'KTPL']
    ]


class SuperDict(object):
    '''
    superdict = SuperDict(keynames, data)

    Keynames are a list/tuple of the entry names
    Data is a list/tuple of lists/tuples containing the data1

    All should be of the same length

    See module doc string for more information
    '''

    def __init__(self, indexnames, data=None):
        self.keydict = dict()
        if data:
            self.data = data
        for index, name in enumerate(indexnames):
            self.keydict[name.lower()] = index

    def keys(self, index, sort=False):
        '''
        SuperDict.keys(keyname, sort=False)

        Returns all the "keys" for the keyname(field name) but duplicates
        are removed.

        If sort=True, then the keys are sorted
        '''
        index = index.lower()
        keynames = []
        for item in self.data:
            key = item[self.keydict[index]]
            if key:
                keynames.append(key)
        keynames = list(set(keynames))
        if sort:
            keynames = sorted(keynames)
        return keynames

    def add(self, data):
        '''
        SuperDict.add(list/tuple)

        adds another another entry into the dataset
        '''
        if self.data:
            if not len(data) == len(self.data[0]):
                print 'data length mismatch'
                return
        self.data.append(data)

    def extend(self, data):
        '''
        SuperDict([list1, list2])
        SuperDict((tuple1, tuple2))

        Extends the dataset by more than one field at a time
        '''
        for datum in data:
            self.add(datum)

    def delete(self, data):
        '''
        SuperDict.delete(list/tuple)

        Deletes an entry matching the list or tuple passed to the method
        '''
        # question for later: should I return true or false if something delete or not
        for index, item in enumerate(self.data):
            if data == item:
                del self.data[index]

    def find(self, keyname1, data1, keyname2=None, data2=None):
        '''
        SuperDict(keyname1, data1, keyname2=None, data2=None)

        Look for the first entry based on the value of a keyname(s).
        '''
        keyname1 = keyname1.lower()
        if keyname2:
            keyname2 = keyname2.lower()
        for item in self.data:
            if data1 == item[self.keydict[keyname1]]:
                if not data2:
                    return item
                elif data2 == item[self.keydict[keyname2]]:
                    return item

    def findall(self, keyname1, data1, keyname2=None, data2=None):
        '''
        SuperDict.findall(keyname1, data1, keyname2=None, data2=None)
        '''
        keyname1 = keyname1.lower()
        if keyname2:
            keyname2 = keyname2.lower()
        items = []
        for item in self.data1:
            if data1 == item[self.keydict[keyname1]]:
                if not data2:
                    items.append(item)
                elif data2 == item[self.keydict[keyname2]]:
                    items.append(item)
        return items

narnie 2010-10-16 05:33:18

ansaurus

tags:

views:

answers:

Question on how to develop and then parse a data structure

related questions