ansaurus

Question

Predefined text replacements in C++ and Python

Answer 1

+1 A:

In Python, strings are immutable, so probably (depending on what you're doing with the output), you want to create a list of all the simple substitution results. Something like:

MORSE = {'A': '.-', ...}

def morsify(data):
    return [MORSE[c] for c in data if c in MORSE]

You'd need to get corresondingly fancier if you wanted to support different national versions of Morse code etc.

(Edited to deal with the fact that Morse code is apparently not a prefix code.)

llasram 2010-09-15 14:17:33

You can't join with an empty string--Morse is not a prefix code--but hopefully that doesn't matter since the questioner is on top of that aspect.

Steve Jessop 2010-09-15 14:24:18

@Steve: Morse *should* be a prefix code! According to every source I know, it is.

Konrad Rudolph 2010-09-15 15:05:51

@Konrad: "E" is "." "T" is "-". "M" is "--". "O" is "---". That's about as non-prefix as a code can possibly be - *every* symbol other than "E" and "T" has a prefix which is a symbol. Operators leave a pause between letters.

Steve Jessop 2010-09-15 16:31:41

Being stupid … [ ](http://example.com)

Konrad Rudolph 2010-09-16 13:21:38

Answer 2

+1 A:

There's no simple optimal structure - for any given fixed mapping, there might be fiendish bit-twiddling optimisations for that precise mapping, that are better or worse on different architectures and different inputs. A map/dictionary should be pretty good all the time, and the code is pretty simple.

My official advice is to stick with that. Lookup performance is very unlikely to be an issue for code like this, since most likely you can easily encode/decode faster than you can input/output.

As it's a learning exercise, and you want to try out different possibilities: for the text -> morse you could use an array rather than a map/dictionary. Perhaps surprisingly, this is difficult to do in C++ and be completely portable. The following assumes that all uppercase letters have char values greater than A, which is not guaranteed by the standard:

std::string encode['Z'-'A'];
encode['A' - 'A'] = ".-";
encode['B' - 'A'] = "-...";
// etc.
encode['Z' - 'A'] = "--..";

If you're willing to assume that your code will only ever run on machines whose basic character set has the letters in a continuous run (true of ASCII, but not EBCDIC), you can tidy it up a bit:

std::string encode[26] = {".-", "-...", /* etc */ "--.."};

To look up the character stored in the variable c:

morse = encode[c - 'A'];

The Python version can assume ASCII (I think), and you'd have to throw in some use of ord.

To cope with anything other than upper case letters, you need either a bigger array (to contain an entry for every possible char value), or else multiple arrays with bounds checks, special case code for punctuation and so on.

Steve Jessop 2010-09-15 14:17:41

Interesting strategy. Thanks for the answer.

sukhbir 2010-09-15 16:08:20

Answer 3

+1 A:

The first hit on Google.

Depending on the quantity of text, 'foo'.replace('f', '..-.').replace('o','---') would work.

Unless you're translating thousands of lines of text, you probably won't notice a whole lot of difference in any method that you use - though you can easily use the timeit module to test each different method.

Wayne Werner 2010-09-15 14:18:32

For this case, I agree, speed should not matter. But I wanted to know it otherwise, in cases where it did.

sukhbir 2010-09-15 16:09:49

Answer 4

+2 A:

You could use str.translate:

m = {ord('S'): '---', ord('O'): '...'}
print('S O S'.translate(m))

will print:

--- ... ---

Joschua 2010-09-15 14:19:56

Python 3.x only?

Nick T 2010-09-15 14:40:25

I will read up. Thanks.

sukhbir 2010-09-15 16:10:12

No, but you have to made some conversions. In Python 2.6 it's a little bit complicated.

Joschua 2010-09-15 16:20:40

Answer 5

+2 A:

from collections import defaultdict

morsecode = [('a','._'), ('b','_...'), ('c','_._.')]
codedict = defaultdict(lambda:' ')
for k,v in morsecode:
    codedict[k] = v

tomorse = lambda x: ' '.join([codedict[chr] for chr in x])

print tomorse('bab cab')

Gives:

_... ._ _...   _._. ._ _...

Nick T 2010-09-15 14:43:44

Answer 6

+1 A:

On the Python side the string classes translate function is the way to go. On the C++ side I would go with a std::map to hold the character mapping. Then I would probably use std::for_each to do the look up and swap.

stonemetal 2010-09-15 14:47:24

ansaurus

tags:

views:

answers:

Predefined text replacements in C++ and Python

related questions