Should I use RegexParsers, StandardTokenParsers or are these suitable at all for parsing this kind of syntax? Example of the syntax can be found from here.
+4
A:
This format was designed to be easy to parse, you can do it without any regular expressions and without tokenizing your input. Just go line by line and look at the first couple of characters. The file header and chunks headers will require a little more attention, but it's nothing you can't do with split.
Of course, if you want to learn how to use some parsing libraries, then go for it.
Radomir Dopieralski
2010-08-24 19:13:08
Years ago, in my first week of programming, I started creating patches for Nethack. Not knowing about `diff`, I started writing the damn things by hand. You can imagine my embarrassment when someone in the newsgroup politely informed me that I might be out of my mind. Well, anyway not only are unified diffs easy to parse, they're actually not so hard to write by hand either. :)
guns
2010-08-24 19:25:07
I wouldn't want to explicitly keep the state myself and learning to use the parser combinators is also one of my goals. On most of the examples there is some programming language kind of syntax to be parsed, but I was wondering couldn't the parser combinators also be used for parsing diff syntax or even binary formats.
JtR
2010-08-24 20:25:16
+2
A:
I'd use regex. It simplifies a few things, and makes the rest standard.
def process(src: scala.io.Source) {
import scala.util.matching.Regex
val FilePattern = """(.*) ''(.*)''"""
val OriginalFile = new Regex("--- "+FilePattern, "path", "timestamp")
val NewFile = new Regex("+++ "+FilePattern, "path", "timestamp")
val Chunk = new Regex("""@@ -(\d+),(\d+) +(\d+),(\d+) @@""", "orgStarting", "orgSize", "newStarting", "newSize")
val AddedLine = """+(.*)""".r
val RemovedLine = """-(.*)""".r
val UnchangedLine = """ (.*)""".r
src.getLines() foreach {
case OriginalFile(path, timestamp) => println("Original file: "+path)
case NewFile(path, timestamp) => println("New file: "+path)
case Chunk(l1, s1, l2, s2) => println("Modifying %d lines at line %d, to %d lines at %d" format (s1, l1, s2, l2))
case AddedLine(line) => println("Adding line "+line)
case RemovedLine(line) => println("Removing line "+line)
case UnchangedLine(line) => println("Keeping line "+line)
}
}
Daniel
2010-08-24 19:56:02
I was hoping to be able to use parser combinators to get rid of keeping the state myself. I'm building an object graph out of the patch details something like that I have a Patch that contains FileModifications that contain Chunks. It seems that parser combinators could offer easier way to create objects out of parsed things instead of building the object graph on some variables on the way and tracking the parsing state.
JtR
2010-08-24 20:22:52
Btw, I didn't know that regexps can be used in pattern matching like that, very neat!
JtR
2010-08-24 20:26:01
+1
A:
Here is a solution using RegexParsers
.
import scala.util.parsing.combinator.RegexParsers
object UnifiedDiffParser extends RegexParsers {
// case classes representing the data of the diff
case class UnifiedDiff(oldFile: File, newFile: File, changeChunks: List[ChangeChunk])
case class File(name: String, timeStamp: String)
case class ChangeChunk(rangeInformation: RangeInformation, changeLines: List[String])
case class RangeInformation(oldOffset: Int, oldLength: Int, newOffset: Int, newLength: Int)
override def skipWhitespace = false
def unifiedDiff: Parser[UnifiedDiff] = oldFile ~ newFile ~ rep1(changeChunk) ^^ {
case of ~ nf ~ l => UnifiedDiff(of, nf, l)
}
def oldFile: Parser[File] = ("--- " ~> filename) ~ ("""\s+""".r ~> timestamp <~ newline) ^^ {
case f~t => File(f, t)
}
def newFile: Parser[File] = ("+++ " ~> filename) ~ ("""\s+""".r ~> timestamp <~ newline) ^^ {
case f~t => File(f, t)
}
def filename: Parser[String] = """[\S]+""".r
def timestamp: Parser[String] = """.*""".r
def changeChunk: Parser[ChangeChunk] = rangeInformation ~ (newline ~> rep1(lineChange)) ^^ {
case ri ~ l => ChangeChunk(ri, l)
}
def rangeInformation: Parser[RangeInformation] = ("@@ " ~> "-" ~> number) ~ ("," ~> number) ~ (" +" ~> number) ~ ("," ~> number) <~ " @@" ^^ {
case a ~ b ~ c ~ d => RangeInformation(a, b, c, d)
}
def lineChange: Parser[String] = contextLine | addedLine | deletedLine
def contextLine: Parser[String] = """ .*""".r <~ newline
def addedLine: Parser[String] = """\+.*""".r <~ newline
def deletedLine: Parser[String] = """-.*""".r <~ newline
def newline: Parser[String] = """\n""".r
def number: Parser[Int] = """\d+""".r ^^ {_.toInt}
def main(args: Array[String]) {
val reader = {
if (args.length == 0) {
// read from stdin
Console.in
} else {
new java.io.FileReader(args(0))
}
}
println(parseAll(unifiedDiff, reader))
}
}
michael.kebe
2010-08-25 09:32:58