ansaurus

Question

Is the valid state domain of a program a regular language?

Answer 1

A:

If your program is decorated with assert statements, then each assert statement defines a valid state. The program statements between the assertions define the valid state changes.

A program that crashes has violated enough assertions that something broken.

A program that's incorrect but "flaky" has violated at least one assertion but hasn't failed.

It's not at all clear what you're looking for. The valid states are -- sometimes -- hard to define but -- usually -- easy to represent as simple assert statements.

Since a crashed program has violated one or more assertions, a program with explicit, executable assertions, doesn't need an crash debugging. It will simply fail an assert statement and die visibly.

If you don't want to put in assert statements then it's essentially impossible to know what state should have been true and which (never-actually-stated) assertion was violated.

Unwinding the call stack to work out the position and the nesting is trivial. But it's not clear what that shows. It tells you what broke, but not what other things lead to the breakage. That would require guessing what assertions where supposed to have been true, which requires deep knowledge of the design.

Edit.

"seem related" and "seem unrelated" are undefinable without recourse to the actual design of the actual application and the actual assertions that should be true in each stack frame.

If you don't know the assertions that should be true, all you have is a random puddle of variables. What can you claim about "related" given a random pile of values?

Crash 1: a = 2, b = 3, c = 4 

Crash 2: a = 3, b = 4, c = 5

Related? Unrelated? How can you classify these without knowing everything about the code? If you know everything about the code, you can formulate standard assert-statement conditions that should have been true. And then you know what the actual crash is.

S.Lott 2010-04-09 21:28:39

I'm not looking at the state transition of a program but at the instantaneous state as defined only by the call stack return pointers. The thought is that it might be possible to use regular languages to differentiate crashes/assert-failures that manifest at the same point but are caused by different root causes. This *would* be a heuristic so it shouldn't be expected to work all the time but if you are in triage mode, very much alpha mode or have a big enough user base, you may legitimately expect to have enough crash reports that you need tools for dealing with them.

BCS 2010-04-09 21:44:04

The stack traces is an unsophisticated state machine. It's just a stack trace -- a singly-linked list. Matching the variables in each state against some condition is equally trivial. I don't get the question, since the stack frames are just a singly-linked list.

S.Lott 2010-04-10 11:28:58

I'm not sure why you are talking about "a random puddle of variables". In what I'm proposing, I wouldn't even have that much. The only thing I'm proposing looking at is the addresses that would be executed upon return.

BCS 2010-04-10 14:42:35

What I'm proposing is for dealing with automatic crash reports (e.i. every time an assert fails, it sends a bunch of data to a server). I'm basing my idea on the assumption that many bugs do not manifest at the site of the bug but rather are detected somewhere else, so rather than just keep track of the assert that failed, it keeps the whole stack. I'm assuming that these will form a pattern of; a string of calls from main to the bug, a string of calls relating to the bug and a string of calls that aren't the but but are effected by it. Only the middle bit might be unique to each bug.

BCS 2010-04-10 14:50:16

@BCS: "I'm assuming that these will form a pattern" I doubt they will form a "pattern" of any kind. You can -- given enough information about the program -- determine a root cause. But there won't be any "pattern". There will be duplicates (known bugs) and non-duplicates (distinct, irreproducible bugs). Unless you analyze the actual code, the states of the variables are just random values.

S.Lott 2010-04-11 02:58:22

@BCS: The variables *are* the state of the program. The address is even less useful for determining the crash. That will not form any "pattern" at all. It will either match or it won't. The call stack is just a list of addresses that either match or don't. There's no "pattern", just exact matches or not. Analyze some real crash dumps before wasting too much more time on this.

S.Lott 2010-04-11 03:00:33

Ahhh! I think I see where we are thinking differently. I'm **NOT** thinking of this as a diagnostic tool. It would not be used for finding the reason or root cause of a bug, but only as a heuristic for partitioning program failure incidents by common causes (root and/or proximal).

BCS 2010-04-11 14:53:04

As for patterns vs. only exact matches; One trivial case that most definitely results in a pattern is an inadvertent infinite indirect recursion. The prefix will start with `main` but can differ hugely for different trigger cases of the same bug. (Oh, and I have looked at a number of crashes so I at least have *some* clue as to what I'm talking about :)

BCS 2010-04-11 14:57:12

@BCS: "partitioning program failure incidents" still makes no sense to me. Please update your question to provide some examples or something. Failure tracebacks are trivially available in languages like Java and Python. You don't need any tools at all to parse them; they're directly available in an exception handler. They trivially form exact matches. A particular line of code is a known failure. There's no subtlety or "pattern matching" to this. Please UPDATE the question with some kind of tangible example. If you've examined dumps, use REAL dumps.

S.Lott 2010-04-12 01:05:55

OK. Updated it.

BCS 2010-04-12 03:48:24

Answer 2

A:

The return pointer on the stack is just a pointer to memory. In theory if you look at the call stack of a program that just makes one function call, the return pointer (for that one function) can have different value for every execution of the program. How would you analyze that?

In theory you could read through a core dump using a map file. But doing so is extremely platform and compiler specific. You would not be able to create a general tool for doing this with any program. Read your compiler's documentation to see if it includes any tools for doing postmortem analysis.

jmucchiello 2010-04-09 22:17:56

Position independent code aside (like in a DLL/so) the location of executable code is fixed at compile time. (Note I'm not looking at the frame pointer but the instruction pointer.) Also, I think that the location of even DLL code is only a function of the versions of the libraries loaded and if it isn't, then a little work would even be able to back-out that. In short, replace literal return pointers with line numbers and nothing I care about changes

BCS 2010-04-09 23:14:59

ansaurus

tags:

views:

answers:

Is the valid state domain of a program a regular language?

related questions