If I understand correctly, you're looking to extract substrings delimited by double-quotation marks ("). You could use capture-groups in regular expressions:
String text = "Vulcans are a humanoid species in the fictional \"Star Trek\"" +
" universe who evolved on the planet Vulcan and are noted for their " +
"attempt to live by reason and logic with no interference from emotion" +
" They were the first extraterrestrial species officially to make first" +
" contact with Humans and later became one of the founding members of the" +
" \"United Federation of Planets\"";
String[] entities = new String[10]; // An array to hold matched substrings
Pattern pattern = Pattern.compile("[\"](.*?)[\"]"); // The regex pattern to use
Matcher matcher = pattern.matcher(text); // The matcher - our text - to run the regex on
int startFrom = text.indexOf('"'); // The index position of the first " character
int endAt = text.lastIndexOf('"'); // The index position of the last " character
int count = 0; // An index for the array of matches
while (startFrom <= endAt) { // startFrom will be changed to the index position of the end of the last match
matcher.find(startFrom); // Run the regex find() method, starting at the first " character
entities[count++] = matcher.group(1); // Add the match to the array, without its " marks
startFrom = matcher.end(); // Update the startFrom index position to the end of the matched region
}
OR write a "parser" with String functions:
int startFrom = text.indexOf('"'); // The index-position of the first " character
int nextQuote = text.indexOf('"', startFrom+1); // The index-position of the next " character
int count = 0; // An index for the array of matches
while (startFrom > -1) { // Keep looping as long as there is another " character (if there isn't, or if it's index is negative, the value of startFrom will be less-than-or-equal-to -1)
entities[count++] = text.substring(startFrom+1, nextQuote); // Retrieve the substring and add it to the array
startFrom = text.indexOf('"', nextQuote+1); // Find the next " character after nextQuote
nextQuote = text.indexOf('"', startFrom+1); // Find the next " character after that
}
In both, the sample-text is hard-coded for the sake of the example and the same variable is presumed to be present (the String variable named text
).
If you want to test the contents of the entities
array:
int i = 0;
while (i < count) {
System.out.println(entities[i]);
i++;
}
I have to warn you, there may be issues with border/boundary cases (i.e. when a " character is at the beginning or end of a string. These examples will not work as expected if the parity of " characters is uneven (i.e. if there is an odd number of " characters in the text). You could use a simple parity-check before-hand:
static int countQuoteChars(String text) {
int nextQuote = text.indexOf('"'); // Find the first " character
int count = 0; // A counter for " characters found
while (nextQuote != -1) { // While there is another " character ahead
count++; // Increase the count by 1
nextQuote = text.indexOf('"', nextQuote+1); // Find the next " character
}
return count; // Return the result
}
static boolean quoteCharacterParity(int numQuotes) {
if (numQuotes % 2 == 0) { // If the number of " characters modulo 2 is 0
return true; // Return true for even
}
return false; // Otherwise return false
}
Note that if numQuotes
happens to be 0
this method still returns true
(because 0 modulo any number is 0, so (count % 2 == 0)
will be true
) though you wouldn't want to go ahead with the parsing if there are no " characters, so you'd want to check for this condition somewhere.
Hope this helps!