tags:

views:

53

answers:

2

I am trying to create a map by taking the first character of each word and it's position in a sentence/paragraph. I am using regex pattern to achieve this. Regex is a costly operation. Are there are any ways to achieve this?

Regex way:

public static void getFirstChar(String paragraph) {
    Pattern pattern = Pattern.compile("(?<=\\b)[a-zA-Z]");
    Map newMap = new HashMap();

    Matcher fit = pattern.matcher(paragraph);
    while (fit.find()) {
        newMap.put((fit.group().toString().charAt(0)), fit.start());
    }
}
A: 

Python:

wmap = {}
prev = 0
for word in "the quick brown fox jumps over the lazy dog".split():
    wmap[word[0]] = prev
    prev += len(word) + 1

print wmap

If a letter appears more than once as the first letter of a word it'll map to the last position. For a list of all positions change wmap[word[0]] = prev to:

if word[0] in wmap:
    wmap[word[0]].append(prev)
else:
    wmap[word[0]] = [prev]
bboe
I want a solution in java :)
Radhika
It should be pretty easy to translate this code to java. I'm pretty sure split is the same, and your use of map is the same. Can't make it too easy for ya :)
bboe
This `split` works for spaces. The original code finds each letter following a non-letter, which might be an apostrophe (it finds the `m` in `I'm`).
Christian Semrau
A: 

You can do your own linear scan if you really need to squeeze every bit of performance:

                 //0123456789012345678901
    String text = "Hello,my name is=Helen";
    Map<Character,Integer> map = new HashMap<Character,Integer>();

    boolean lastIsLetter = false;
    for (int i = 0; i < text.length(); i++) {
        char ch = text.charAt(i);
        boolean currIsLetter = Character.isLetter(ch);
        if (!lastIsLetter && currIsLetter) {
            map.put(ch, i);
        }
        lastIsLetter = currIsLetter;
    }

    System.out.println(map);
    // prints "{n=9, m=6, H=17, i=14}"

API links

polygenelubricants
Thanks. This gave a better performance than regex operation.
Radhika
@Radhika: consider accepting the answer (clicking on the green check on the left) if an answer is satisfactory. If you still need help with the question, explain your concerns and I'll try to address it.
polygenelubricants