views:

275

answers:

6

I know 8086 Assembly and learning MIPS Assembly. Also, I'm learning Regular Expressions, then I want to know:

  • How can I use Regular Expressions on them?
+7  A: 

This is a challenging problem to pull off in assembly from scratch. No assembly language would support regular expressions as a first-class construct because there's too much of a difference in the abstraction level to make it a useful inclusion. That means you need to build it yourself.

Supporting regular expressions is essentially like having a compiler inside your program that translates the expression into a sequence of matching instructions. You will have to build all of the constituent pieces: a translation engine, a series of transformation rules, a DFA assembler, and a matching engine.

That said, it's not impossible! Start small, supporting tiny subsets of the real language you want to support, and then work your way up. Check out chapter 16 of Assembly Language Programming for a detailed walkthrough of how you might build your own regular expression engine. You'll need a good understanding of how they work (which this chapter will give you) and a solid understanding of assembly as well (see the earlier chapters for that).

John Feminella
@John: +1 from me...
tommieb75
@Nathan Campos: Follow-up: while I applaud your desire to take on a challenge, if you're just learning 8086 assembly, this is almost certainly going to be an extremely advanced topic relatively to your current skill level. If you want something that'll still be challenging, try your hand at a simpler subgoal first (perhaps just building a state machine from a string, for example).
John Feminella
I've know 8086 Assembly very much, but learning MIPS. **;-)**
Nathan Campos
+1  A: 

Regular expressions do not exists in assembly, that seems a little bizarre question, as Regex's are of a higher-level language nature, it does not exist at the nuts and bolts level...

Edit: Nathan, here is the link that might be of interest to you. Scroll down to the bottom of the page ;)

Hope this helps, Best regards, Tom.

tommieb75
They may not exist as a first-class construct, but that doesn't mean that this is a "bizarre question". Regular expressions don't exist in C# as a first-class construct either, but I think few people would say it's a bad idea that regexes are in the framework.
John Feminella
@John: I agree...yes, but I was not implying it is impossible to do, rather it would be more of a painful thing to achieve as obviously referencing and pattern matching using registers...that would be some brain-strain...Your answer exactly sums it up! It's just a tad unusual question to ask...hence my saying 'bizarre question'
tommieb75
Meh! Checked that link, it is not what you are looking for! Sorry Nathan :(
tommieb75
+1 You understand that was your mistake. I already have the book.
Nathan Campos
+1  A: 

The set of articles here describes how to build a very simple but powerful regex engine from scratch. It uses C++ but explains the theory in detail and the code can be translated to ASM without too much effort by an experienced programmer.

That said, I don't think it's a particularly interesting exercise, neither for learning ASM nor for learning regular expressions. You'll just get too bogged down by the details.

Eli Bendersky
A: 

Start off with very simple regular expressions. For example, recognising sequences of alphabetic characters and numeric characters and work your way up from there. You will need to consider carefully how your code is going to deliver it's results.

It may be a good idea to create a regex parser is C first, as more people in this forum will be able to help you. Once you have got it working, to can translate it to assembler code. Again, more people here will be familiar with 8086 assembly language programming than with MIPS, so it may be a good idea to use 8086 even though the CPU architecture is not very nice.

Mick Sharpe
A: 

Not sure if you want to know how to implement a regex engine in assembler or just how to easily use regular expressions on your null-ended strings from assembly language. If its the first, you have been given some pointers. If it's the later, it depends on your platform, but the easiest way is to call a C-coded library from your assembly. Unix variants have POSIX regular expressions already avalaible in the libc, and you can call them from your assembly, just following the aproppiate calling conventions.

Samuel
+1  A: 

Hi Nathan,

Try this: AsmRegEx - regular expression engine

It's written in FASM. Unfortunately, it seems that the project won't progress anymore...

anta40
That's awesome, I will take a look at it. Thanks. **:-)**
Nathan Campos