ansaurus

Question

I'm writing a spellchecking program, how do I replace ch in a string?

Answer 1

A:

Check out the re (regular expression) module. It has a "sub" function to replace strings that match regular expressions.

Andrew E. Falcon 2010-05-23 23:17:14

Answer 2

+1 A:

In python, strings are immutable, so you need to create a new string with your changes.

There are a few ways to do this:

One is using a list comprehension to inspect the characters and only returning the non-punctuation.

def remove(file):
  return ''.join(ch for ch in file if ch not in string.punctuation)

You could also call functions to test the character or translate the character which you might have throw "weird character" exceptions or do some other functionality:

def remove(file):
  return ''.join(TranslateCh(ch) for ch in file if CheckCh(ch))

Another alternative is the string module, providing replace or translate. Translate provides a nice (and more efficient than building a list) mechanism for this, see Alex's answer.

Or... you could collect a list over a forloop and join it at the end, but that's a little "unpythonic".

Stephen 2010-05-23 23:18:12

+1. No need for the brackets.

Adam Bernier 2010-05-23 23:23:11

thank you so much for that, greatly appreciated

Ajay Hopkins 2010-05-23 23:28:45

@Adam : true, thanks.

Stephen 2010-05-23 23:57:29

`string.maketrans` is for byte strings, and deprecated in Python 3 in favor of `bytes.maketrans` -- definitely not what the OP needs in Python 3.

Alex Martelli 2010-05-24 00:08:54

@Alex : hm, interesting, thanks. removed the suggestion.

Stephen 2010-05-24 00:17:02

Answer 3

+2 A:

In this code...:

for ch in file:
        if len(ch) > 1:

the weirdly-named file (besides breaking the best practice of not hiding builtin names with your own identifier) is not a file, it's a string -- which means unicode, in Python 3, but that makes no difference to the fact that the loop is returning single characters (unicode characters, not bytes, in Python 3) so len(ch) == 1 is absolutely guaranteed by the rules of the Python language. Not sure what you're trying to accomplish with that test (rule out some subset of unicode characters?), but, whatever it is you thing you're achieving, I assure you that you're not achieving it and should recode that part.

Apart from this, you're returning -- and therefore exiting the function -- immediately, and thereby exiting the function and returning just one character (the first one in the file, or a space if that first one was a punctuation character).

The suggestion to use the translate method, which I saw in another answer, is the right one, but that answer used the wrong version of translate (one applying to byte strings, not to unicode strings as you need for Python 3). The proper unicode version is simpler, and transforms the whole body of your function into just two statements:

trans = dict.fromkeys(map(ord, string.punctuation), ' ')
return file.translate(trans)

Alex Martelli 2010-05-24 00:12:14

ansaurus

tags:

views:

answers:

I'm writing a spellchecking program, how do I replace ch in a string?

related questions