ansaurus

Question

Ignoring case, punctuation, and whitespace in Strings

Answer 1

+2 A:

If you want iteration on a String instance to iterate on its self.__string, as your __iter__ method indicates, the only sensible choice for length is also to return the length of __string -- it would be truly peculiar if len(x) and sum(1 for _ in x) resulted in different values.

I have to admit I don't understand the purpose of this class (and in particular why you made the terrible choice of having it old-style, and why you use such a contorted way to build __simple), but internal consistency is important anyway. So, either change __iter__, or make __len__ logically compatible with it.

Your slicing logic also totally escapes me -- why are you building the slice's __simple in a way that's likely to be different from what you'd get by rebuilding it from the slice's __string? E.g., if self.__string is '?Boh!' and therefore self.__simple is 'boh', why would you want self[1:-1] to have a __string of 'Boh' but with a __simple of 'o', so incompatible, different, and inconsistent from the __simple you'd get by recomputing it from the slice...?

I guess that's not germane to this Q about length, but I'm just curious about these many, extremely peculiar design choices that you're making...

Alex Martelli 2010-01-30 19:51:47

Since Python 3.0 came out, I have only been writing code in that language (currently 3.1). If you are familiar with how classes work in 3.1, then you know that this is a new-style class (all classes in Python 3.x are new-style classes). Therefore, it is unnecessary to explicitly inherit from `object`. As for how slices work, please notice this line: `self.__string = tuple(string.split())` These string objects work on words and not characters.

Noctis Skytower 2010-01-31 03:21:54

@Noctis, you've changed the code so drastically (I see no String class any more, for example!) that it's impossible to reconnect my comments to your current code. As for Python 3, that's great, but mentioning it in your Q would obviously be better since that's *not* what the vast majority of Python users are using today;-).

Alex Martelli 2010-01-31 05:27:10

Answer 2

+2 A:

"What is the best way to go about fixing this problem?"

The best -- and only -- way is to define what this object "means" and what the length of this object "means".

The object appears to be a list of words. Nothing more. That seems to be the value in _string.

It's not clear what _simple is, other than an inaccessible filtered subset of the words in _string.

So what's the length? The length of the words or the length of the words in the filtered subset?

Only you can define what this class means. The meaning will then determine how to implement __len__. Until you define the meaning, it's impossible to determine how anything should be implemented.

S.Lott 2010-01-30 20:32:01

ansaurus

tags:

views:

answers:

Ignoring case, punctuation, and whitespace in Strings

related questions