views:

47

answers:

4

I'm retrieving raw text (includes header, and message) from a POP server. I need to capture everything after the header which is terminated by a blank line between it and the user message.

At the same time I'm wanting to ignore anything from original messages if it's a reply. The start of a reply for the emails I'm parsing start with

------Original Message------

An example email might look like this

Return-Path: ...
...
More Email Metadata: ...

Hello from regex land, I'm glad to hear from you.
------Original Message------
Metadata: ...
...

Hey regex dude, can you help me? Thanks!

Sincerely, Me.

I need to extract "Hello from regex land, I'm glad to hear from you." and any other text/lines prior to the original message.

I'm using this regex right now (C# in multiline mode)and it seems to work except it's capturing ------Original Message------ if the body is blank. I'd rather just have a blank string instead.

^\s*$\n(.*)(\n------Original Message------)?

Edit
I haven't down voted anyone and if you happen to downvote, it's usually helpful to include comments.

A: 

Why not replace "\n------Original Message------" after the capture?

RC
The thing is that Original Message doesn't return on messages with text in the body, only when there is nothing in the body. So I think that my pattern is only capturing one line. I need all lines captured prior to that or none at all. I could do a replace though. Thanks
jlafay
If you run into a bug, mask it! Great advice ;-)
Timwi
A: 

Why don't you not use DotnetOpenMail? Using a regex to do this is a wrong approach, you'd be better off using a dedicated email handler instead....

tommieb75
I'm using a POP3 client that I was told to use and instead of retrieving messages as objects (as I would prefer), I can only retrieve raw text for each message. Otherwise this wouldn't be an issue.
jlafay
Uhhh... that does not really make sense using regex for this... what pop3 client are you using - that pop3 client should be taking care of the handling of the body of the message etc... otherwise regex would not be needed!!
tommieb75
Thanks for trying to help tommie. Let's put it in this perspective then. I have PO3 mail client code and I'm extending it to instantiate a MailMessage object for each message retrieved from the POP server. Now I'm writing methods to extract portions of the raw text to hydrate the object properties.
jlafay
And I agree.. all of this wouldn't be needed if that were the case :)
jlafay
tommie, I think I may be asking for too much in a regex capture. I'm going to try out DotnetOpenMail. Thanks for pointing me in the right direction.
jlafay
A: 

The reason for this is that you have an extra \n inside the parenthesis. If the body is blank, there is no extra newline there. Therefore, try this:

^\s*$\r\n(.*)(^------Original Message------$)?

If you don’t want the newline at the end of the body, you can still use string.Trim() on the matched part.

Note: This assumes that the input uses \r\n line terminators (which is required in e-mail headers according to the MIME standard).

Timwi
This produces the same results.
jlafay
@jlafay: Yeah, sorry. It should be `\r\n` instead of just `\n`. Updated the answer.
Timwi
A: 

You need to replace (\n------Original Message------) with (?=(\n------Original Message------)) lookahead to not return that part, just to ensure it's there

El Ronnoco
This is better. The problem is that it doesn't account for emails that don't contain "Original Message". Much closer though, thanks.
jlafay
What are the alternative terminators other than `original message` ?
El Ronnoco
I just want it to stop capturing before original message line. Not all emails will have that line, just most of them do. So if that line doesn't exist it's a new email and not a reply. I want all of that captured.
jlafay
Who gave me a downvote and what's the reason?! Perhaps try `(?=(\n------Original Message------|$))` which should take you to the end of the message.
El Ronnoco