views:

679

answers:

4

I've often seen people use Perl data structures in lieu of configuration files; i.e. a lone file containing only:

%config = (
    'color' => 'red',
    'numbers' => [5, 8],
    qr/^spam/ => 'eggs'
);

What's the best way to convert the contents of these files into Python-equivalent data structures, using pure Python? For the time being we can assume that there are no real expressions to evaluate, only structured data.

+16  A: 

Is using pure Python a requirement? If not, you can load it in Perl and convert it to YAML or JSON. Then use PyYAML or something similar to load them in Python.

codelogic
I'd like to use pure Python, but this is helpful nonetheless. :)
cdleary
+6  A: 

Not sure what the use case is. Here's my assumption: you're going to do a one-time conversion from Perl to Python.

Perl has this

%config = (
    'color' => 'red',
    'numbers' => [5, 8],
    qr/^spam/ => 'eggs'
);

In Python, it would be

config = {
    color : 'red',
    numbers : [5, 8],
    re.compile( "^spam" ) : 'eggs'
}

So, I'm guessing it's a bunch of RE's to replace

  • %variable = ( with variable = {
  • ); with }
  • 'variable => value with variable : value`
  • qr/.../ => with re.compile( r"..." ) : value

However, Python's built-in dict doesn't do anything unusual with a regex as a hash key. For that, you'd have to write your own subclass of dict, and override __getitem__ to check REGEX keys separately.

class PerlLikeDict( dict ):
    pattern_type= type(re.compile(""))
    def __getitem__( self, key ):
        if key in self:
            return super( PerlLikeDict, self ).__getitem__( key )
        for k in self:
            if type(k) == self.pattern_type:
                if k.match(key):
                    return self[k]
        raise KeyError( "key %r not found" % ( key, ) )

Here's the example of using a Perl-like dict.

>>> pat= re.compile( "hi" )
>>> a = { pat : 'eggs' } # native dict, no features.
>>> x=PerlLikeDict( a )
>>> x['b']= 'c'
>>> x
{<_sre.SRE_Pattern object at 0x75250>: 'eggs', 'b': 'c'}
>>> x['b']
'c'
>>> x['ji']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in __getitem__
KeyError: "key 'ji' not found"
>>> x['hi']
'eggs'
S.Lott
It can get more complicated when the data structures are nested, which might require a recursive parser of some sort.
codelogic
If it does get more complex, we'd have to see the use cases in some detail. This kind of thing seems like a one-shot convert-by-hand thing. But maybe it isn't. Can't tell from the question what the actual use case is.
S.Lott
This general idea works with a few little tweaks. (Regex conversion is kind of tricky.)
cdleary
(And by regex conversion I mean regexes are being used as keys in the hashes -- changing them to re.compile form is non trivial.)
cdleary
"regexes are being used as keys in the hashes" What does this mean?
S.Lott
Exactly that. The following is perfectly legal perl:my %hash = ( qr/".*?"/ => sub { print "got a character" } );
Robert P
cdleary, There may be a different, more pythonic way to implement what you're doing than storing regexes as keys in a hash. If we knew the application, it might make helping find an answer easier. :)
Robert P
@Robert P: May be legal, but I can't interpret it. I don't know how you do a lookup with a regex as the key to a hash. Sorry, but I'm unable to help because I don't get the semantics.
S.Lott
Is this answer a joke? This is obviously way too complicated to do with regular expressions (b/c of nested data structures, etc).
Horace Loeb
@Horace Loeb: The question didn't show nested data structures. Perhaps it doesn't much matter for the questioner's specific situation.
S.Lott
@S. Lott : What you would do is iterate over the keys of the hash, like so: foreach my $regex (keys %hash) { if ($input =~ $regex) { $hash{$regex}->('some', 'call', 'to', 'the', 'subroutine') } } You'd use it like a list, without the need for two actual lists (or guarantee of order).
Robert P
What he probably really needs is a list of pairs - a 'regex' as the "left" object, and a "whatever" on the right.
Robert P
@Robert P: does your idea match the code I posted last night in this answer?
S.Lott
@S.Lott: PerlLikeDict breaks many dict invariants e.g. `key in perl_like_dict` doesn't work as expected. A better way is to subclass `str` and override `__eq__` method (and probably `__hash__`) thus qr/../ -> ReString(".."). In this case ordinary dicts can be used.
J.F. Sebastian
@J.F. Sebastian: I'm not sure I agree that a PerlLikeDict has the same invariants as a standard dict. I think it would be better to override __contains__.
S.Lott
A: 

I've also found PyPerl, but it doesn't seem to be maintained. I guess something like this is what I was looking for -- a module that did some basic interpretation of Perl and passed the result as a Python object. A Perl interpreter that died on anything too complex would be fine. :-)

cdleary
PyPerl is *not* pure Python. It is a python extension module written in C.
J.F. Sebastian
Point taken -- I guess I was looking for something like PyPerl that was implemented in pure Python. :-)
cdleary
+8  A: 

I'd just turn the Perl data structure into something else. Not seeing the actual file, there might be some extra work that my solution doesn't do.

If the only thing that's in the file is the one variable declaration (so, no 1; at the end, and so on), it can be really simple to turn your %config it into YAML:

perl -MYAML -le 'print YAML::Dump( { do shift } )' filename

The do returns the last thing it evaluated, so in this little code it returns the list of hash key-value pairs. Things such as YAML::Dump like to work with references so they get a hint about the top-level structure, so I make that into a hash reference by surrounding the do with the curly braces. For your example, I'd get this YAML output:

---
(?-xism:^spam): eggs
color: red
numbers:
  - 5
  - 8

I don't know how Python will like that stringified regex, though. Do you really have a key that is a regex? I'd be curious to know how that's being used as part of the configuration.


If there's extra stuff in the file, life is a bit more tough. There's probably a really clever way to get around that, but I used the same idea, but just hard-coded the variable name that I wanted.

I tried this on the Perl data structure that the CPAN.pm module uses, and it looks like it came out fine. The only ugliness is the fore-knowledge of the variable name that it supplies. Now that you've seen the error of configuration in Perl code, avoid making the same mistake with Python code. :)

YAML:

 perl -MYAML -le 'do shift; print YAML::Dump( $CPAN::Config )' MyConfig.pm

JSON:

 perl -MJSON::Any -le 'do shift; my $j = JSON::Any->new; print $j->objToJson( $CPAN::Config )' MyConfig.pm

or

# suggested by JF Sebastian
perl -MJSON -le 'do shift; print to_json( $CPAN::Config )' MyConfig.pm

XML::Simple doesn't work out so well because it treated everything like an attribute, but maybe someone can improve on this:

perl -MXML::Simple -le 'do shift; print XMLout( $CPAN::Config )' MyConfig.pm
brian d foy
For the OP myconfig.pm it could be `perl -MJSON -E'do shift; say to_json \%config' myconfig.pm`. But Python will not understand '(?-xism:^spam)' regexp.
J.F. Sebastian
Note the your one-liner is Perl5.10 only :)
brian d foy