I have this working regex (tested on regex coach):
\n[\s]*[0-9]*[\s]*[0-9]*(\.)?[0-9]*(e\+)?[0-9]*
that is supposed to pick up the first 2 columns of this file
http://wwwhomes.uni-bielefeld.de/achim/highly.txt
I read through the man pages, and it says that ^ will match at the beggining of the line so I replaced \n with ^ but egrep isn't agreeing with me when I do this:
egrep -e ^[\s]*[0-9]*[\s]*[0-9]*(\.)?[0-9]*(e\+)?[0-9]* "wwwhomes.uni-bielefeld.de achim highly.txt"
EDIT: it has something to do with (e\+)?
EDIT 2: okay, I'm simplifying the regex. forget about trying to get numbers in scientific notation here is what I am using:
egrep -e "^[[:space:]]*[0-9]*[[:space:]]*[0-9]*" "wwwhomes.uni-bielefeld.de achim highly.txt"
it returns the header lines:
no number divisors 2 3 5 71113171923293137414347535961677173
------------------------------------------------------------------------------
this isn't right...
Final edit:
I needed a combination of grep and sed to get the proper data out. grep removed the header lines and sed formatted the text
grep -E -o -e "^[[:space:]]+[0-9]+[[:space:]]+[0-9e\+\.]+[[:space:]]+[0-9e\+\.]+" "wwwhomes.uni-bielefeld.de achim highly.txt" >grepped.txt
sed -r "s/^\s*[0-9]+\s*([0-9.e+]+)\s*([0-9.e+]+)/\1,\2/" "grepped.txt" >seded.txt