tags:

views:

122

answers:

4

Hello all.

Problem:

I have to design an algorithm, which does the following for me:

Say that I have a line (e.g.)

alert tcp 192.168.1.1 (caret is currently here)

The algorithm should process this line, and return a value of 4.

I coded something for it, I know it's sloppy, but it works, partly.

private int counter = 0;
    public void determineRuleActionRegion(String str, int index) {
        if (str.length() == 0 || str.indexOf(" ") == -1) {
            triggerSuggestionList(1);
            return;
        }

        //remove duplicate space, spaces in front and back before searching
        int num = str.trim().replaceAll(" +", " ").indexOf(" ", index);
        //Check for occurances of spaces, recursively
        if (num == -1) { //if there is no space
            //no need to check if it's 0 times it will assign to 1
            triggerSuggestionList(counter + 1);
            counter = 0;
            return; //set to rule action
        } else { //there is a space
            counter++;
            determineRuleActionRegion(str, num + 1);
        }

    } //end of determineactionRegion()

So basically I find for the space and determine the region (number of words typed). However, I want it to change upon the user pressing space bar <space character>.

How may I go around with the current code?

Or better yet, how would one suggest me to do it the correct way? I'm figuring out on BreakIterator for this case...

To add to that, I believe my algorithm won't work for multi sentences. How should I address this problem as well.

--

The source of String str is acquired from textPane.getText(0, pos + 1);, the JTextPane.

Thanks in advance. Do let me know if my question is still not specific enough.

--

More examples:

alert tcp $EXTERNAL_NET any -> $HOME_NET 22 <caret>

return -1 (maximum of the typed text is 7 words)

alert tcp 192.168.1.1 any<caret> 

return 4 (as it is still at 2nd arg)

alert tcp<caret>

return 2 (as it is still at 2nd arg)

alert tcp <caret>

return 3

alert tcp $EXTERNAL_NET any -> <caret>

return 6

It is something like shell commands. As above. Though I think it does not differ much I believe, I just want to know how many arguments are typed. Thanks.

--

Pseudocode

Get whole paragraph from textpane
  if more than 1 line -> process the last line
      count how many arguments typed and return appropriate number
  else
    process current line
      count how many arguments typed and return appropriate number
End
+1  A: 

What about this: get last line, count what's between spaces...

String text = ...
String[] lines = text.split("\n"); // or \r\n depending on how you get the string
String lastLine = lines[lines.length-1];
StringTokenizer tokenizer = new StringTokenizer(lastLine, " ");
// note that strtokenizer will ignore empty tokens, it is, what is between two consecutive spaces
int count = 0;
while (tokenizer.hasMoreTokens()) {
  tokenizer.nextToken();
  count++;
}
return count;

Edit you could control if you have a final space (lastLine.endsWith(" ")) so you are starting a new word or whatever, it's a basic approach for you to make it up :)

helios
@helios: Hi thanks for replying. Your code is short, nice. I'll try it. If there are no other replies, or better answers, I will accept this as is.
Alex Cheng
Is this a typo? `String[] lines[]` The trailing [] behind lines
Alex Cheng
Definitively. I'll correct it.
helios
+3  A: 

This uses String.split; I think this is what you want.

    String[] texts = {
        "alert tcp $EXTERNAL_NET any -> $HOME_NET 22 ",
        "alert tcp 192.168.1.1 any",
        "alert tcp",
        "alert tcp ",
        "alert tcp $EXTERNAL_NET any -> ",
        "multine\ntest\ntest  1   2   3",
    };

    for (String text : texts) {
        String[] lines = text.split("\r?\n|\r");
        String lastLine = lines[lines.length - 1];

        String[] tokens = lastLine.split("\\s+", -1);
        for (String token : tokens) {
            System.out.print("[" + token + "]");
        }

        int pos = (tokens.length <= 7) ? tokens.length : -1;
        System.out.println(" = " + pos);
    }

This produces the following output:

[alert][tcp][$EXTERNAL_NET][any][->][$HOME_NET][22][] = -1
[alert][tcp][192.168.1.1][any] = 4
[alert][tcp] = 2
[alert][tcp][] = 3
[alert][tcp][$EXTERNAL_NET][any][->][] = 6
[test][1][2][3] = 4
polygenelubricants
@polygenelubricants: Having connection problem, might not be able to reply promptly. Will try. Thanks
Alex Cheng
@Alex: I just modified it a bit after your edit. Make sure you try the latest version.
polygenelubricants
@polygenelubricants: I got your code to work. However there's one problem. After I press enter at the end of the sentence in the JTextPane, the pos returns the same number as to where it was, instead of 0. Only after I press enter again, then only it detects. Same as helios's algorithm.
Alex Cheng
@Alex: try `text.split("\r?\n|\r", -1);` and tell me if that's what you need. (Basically add an additional argument of `-1` to the first `split`)
polygenelubricants
@polygenelubricants: Oh, yes that's what I need. I did a workaround though before I noticed your comment. I manually concatenated a $ to the end of the string.
Alex Cheng
A: 

Is the sample line representative? An editor for some rule based language (ACLs)?

How about going for a full Information Extraction/named entity recognition solution, the one that will be able to recognize entities (keywords, ip addresses, etc)? You don't have to write everything from scratch, there're existing tools and libraries.

UPDATE: Here's a piece of Snort code that I believe does the parsing:

Function ParseRule()
if (*args == '(') {
   // "Preprocessor Rule detected"

} else {
    /* proto ip port dir ip port r*/
    toks = mSplit(args, " \t", 7, &num_toks, '\\');

    /* A rule might not have rule options */
    if (num_toks < 6) {
        ParseError("Bad rule in rules file: %s", args);
    }
..
 }
 otn = ParseRuleOptions(sc, rtn, roptions, rule_type, protocol);
..

mSplit is defined in mstring.c, a function to split a string into tokens.

In your case, ParseRuleOptions should return one for the whole string inside brackets I guess.

UPDATE 2: btw, is your first example correct, since in snort, you can add options to rules? For example this is a valid rule being written (options section not completed):

alert tcp any any -> 192.168.1.0/24 111 (content:"|00 01 86 a5|"; <caret>

In some cases you can have either 6 or 7 'words', so your algorithm should have a bit more knowledge, right?

milan
@milan: Yes I'm creating an editor for Snort Ruleset Language. However, the only module I know in relation to Snort Ruleset is coded in Perl, and that also converts the arguments into a rule hash. So I don't think it is applicable.
Alex Cheng
Well that simplifies things a lot, very interesting programming problem to have ;) guess you can try by counting spaces, but will probably end up being frustrated because it's difficult to manually cover all the possible cases. I'd still suggest at least a simple parser.p.s. perl is not such a terrible language to be avoided at all costs :)
milan
@milan: Thanks for the reply. Yes, it is quite frustrating. Haha. I would love to learn Perl, but this is for my Final Year Project, so I believe I should stick to the language I'm most fluent in. Also, I believe that Perl's GUI coding is quite a PITA. CMIIW.
Alex Cheng
that was a joke, i wasn't saying you should do it in perl. but do consider a simple parser. try to find the snort rule syntax definition (let me know if you do find it). if you're lucky, it you may even find it defined in a lexer/parser syntax understandable by a java lib.
milan
@milan: I get you, no worries, it is just the case for me that I will not use Perl for this. Will do.
Alex Cheng
+1  A: 

The codes provided by polygenelubricants and helios work, to a certain extent. It addresses the aforementioned problem I'd stated, but not with multi-lines. helios's code is more straightforward.

However both codes did not address the problem when you press enter in the JTextPane, it will still return back the old count instead of 1 as the split() returns it as one sentence instead of two.

E.g. alert tcp <enter is pressed> By right it should return 1 since it is a new sentence. It returned 2 for both algorithms. Also, if I highlight all and delete both algorithms will throw NullPointerException as there is no string to be split.

I added one line, and it solved the problems mentioned above:

public void determineRuleActionRegion(String str) {
    //remove repetitive spaces and concat $ for new line indicator
    str = str.trim().replaceAll(" +", " ") + "$";
    String[] lines = str.split("\r?\n|\r");
    String lastLine = lines[lines.length - 1];
    String[] tokens = lastLine.split("\\s+", -1);
    int pos = (tokens.length <= 7) ? tokens.length : -1;
    triggerSuggestionList(pos);
    System.out.println("Current pos: " + pos);
    return;
} //end of determineactionRegion()

With that, when split() parses the str, the "$" will create another line, which will be the last line regardless, and the count now will return to one. Also, there will not be NullPointerException as the "$" is always there.

However, without the help of polygenelubricants and helios, I don't think I will be able to figure it out so soon. Thanks guys!

EDIT: Okay... apparently split("\r?\n|\r",-1) works the same. Question is should I accept polygenelubricants or my own? Hmm.

2nd EDIT: One thing bad about concatenating '%' to the end of the str, lastLine.endsWith(" ") == true will return false. So have to use split("\r?\n|\r",-1) and lastLine.endsWith(" ") == true for the complete solution.

Alex Cheng
ANSWERING YOUR OWN QUESTION? HOW DARE YOU? Just kidding. +1. I hope you get {Self learner} badge from it =)
polygenelubricants
I'm sorry I didn't realise your reply, I blame the broken cable problem in the Mediterranean Sea that is causing internet disruptions from Asia to Europe. Haha, but I did found it out myself. Hope you don't mind ;D Anyway, really thanks a lot!
Alex Cheng