views:

299

answers:

3

I'm working on writing a simple Prolog interpreter in Java.

How can I find the last character index of the first element either the head element or the tail element of a string in "List Syntax"?

List Syntax looks like:

(X)
(p a b)
(func (func2 a) (func3 X Y))
(equal eve (mother cain))

The head for each of those strings in order are:
Head: "X", Index: 1
Head: "p", Index: 1
Head: "func", Index: 4
Head: "equal", Index: 5

Basically, I need to match the string that immediately follows the first "(" and ends either with a space or a closing ")", whichever comes first. I need the character index of the last character of the head element.

How can I match and get this index in Java?


Brabster's solution is really close. However, consider the case of:
((b X) Y)

Where the head element is (b x). I attempted to fix it by removing "(" from the scanner delimiters but it still hiccups because of the space between "b" and "x".

Similarly: ((((b W) X) Y) Z)

Where the head is (((b w) x) Y).

+4  A: 

Java's Scanner class (introduced in Java 1.5) might be a good place to start.

Here's an example that I think does what you want (updated to include char counting capability)

public class Test {

    public static void main(String[] args) {

     String[] data = new String[] {
       "(X)",
       "(p a b)",
       "(func (func2 a) (func3 X Y))",
       "(equal eve (mother cain))",
       "((b X) Y)",
       "((((b W) X) Y) Z)"
     };


     for (String line:data) {
      int headIdx = 0;
      if (line.charAt(1) == '(') {
       headIdx = countBrackets(line);
      } else {
       String head = "";
       Scanner s = new Scanner(line);
       s.useDelimiter("[)|(| ]");
       head = s.next();
       headIdx = line.indexOf(head) + head.length() - 1;
      }
      System.out.println(headIdx);
     }

    }

    private static int countBrackets(String line) {
     int bracketCount = 0;
     int charCount = 0;
     for (int i = 1; i < line.length(); i++) {
      char c = line.charAt(i);
      if (c == '(') {
       bracketCount++;
      } else if (c == ')') {
       bracketCount--;
      }
      if (bracketCount == 0) {
       return charCount + 1;
      }
      charCount++;
     }
     throw new IllegalStateException("Brackets not nested properly");
    }
}

Output:

1
1
4
5
5
13

It's not a very elegant solution, but regexes can't count (i.e. brackets). I'd be thinking about using a parser generator if there's any more complexity in there :)

Brabster
It might also be worthwhile looking at parser generators like ANTLR or JavaCC if you don't actually want to deal with the parsing yourself.
Brabster
@Brabster, how yould you deal with the string: "((b X) Y)" where (b x) was the head of the list?
@Brabster, I tried removing "(" from the list of delimeters but I still run into trouble because of the space inbetween b and x.
Good question. I'll have a think, it's probably beyond the capabilities of the simple regex I have here... hmmm
Brabster
I suppose you can keep on nesting, i.e. ((((b W) X) Y) Z)?
Brabster
and in the case above what is the head of the list?
Brabster
The head of the list above is: (((b W) X) Y)
I guess you just count brackets then. If the second character is a bracket, count the brackets in and out until you find the matching close bracket for the second one - and that's your head. Else, second character not a bracket, use the algorithm above to get the head.
Brabster
A: 

I suggest you write a proper parser (operator precedence in the case of Prolog) and represent the terms as trees of Java objects for further processing.

starblue
I'm using this as a minimal, minimal example of one feature in prolog (unification), where I don't think a lexer/parser is warranted.
Then I would build the object structures directly and don't bother with strings.
starblue
@starblue, that is what I would probably do if I were doing this, build the object structures by hand.
Simucal
+1  A: 

Is there a reason you can't just brute force it? Something like this?

public int firstIndex( String exp ) {
 int parenCount = 0;
 for (int i = 1; i < exp.length(); i++) {
  if (exp.charAt(i) == '(') {
   parenCount++;
  }
  else if (exp.charAt(i) == ')') {
   parenCount--;
  }
  if (parenCount == 0 && (exp.charAt(i+1) == ' ' || exp.charAt(i) == ')')) {
   return i;
  }
 }
}

I may be missing something here, but I think that would work.

Morinar
I missed the extra comment from Brabster where he mentions doing basically exactly this.
Morinar