tags:

views:

99

answers:

4

I have got 2 files which have got n number of lines. eg: File 1

465466454
546545454
5454454
Data=4545454545484848484
kuh uytyer huihkuh

File 2

e4654sdfdsf
544hjklhjl
464jku
Data=4545454545484848484
kuh uytyer huihkuh

As you can see both files have got the same data after the place "Data=" (this "Data=" occurs only once in the file)

So i need to cut the lines that are before the "=" sign and then compare these two files and then get an output stating if these 2 files are equal in a csv format file.

Its not just one file but it will be many files in 2 different folders and every first file in one folder needs to be compared with the first file in the another folder and so on..

A: 

Do you know how many lines you have before the "=" sign? (ie: for 2 given files, are "n" equals or not?) Because you could use -B (or --before) option for grep.

Aif
Hi we cant tell the number of lines that are before the = sign . I am just a Functional Guy with no knowledge in coding especially Unix.. could you please help me with a complete coding?
moustafa
The `-A` is just to show context on output. If the line count before the = was known and consistent I'd probably use `head`. @moustafa - if you don't know Unix or coding why is this task assigned to you? (seriously, not sarcastically)
Stephen P
my mistake, corrected the post. Thanks.
Aif
A: 

You can use the unix tool awk to get the data after the "=" and then use diff to compare it.

thomasfedb
how i can do that?
moustafa
See the answer by ghostdog74
thomasfedb
+2  A: 

this is how you use awk to get the data after the "=" sign

awk '/Data=/{gsub("Data=","");f=1}f' file > temp1

that is being redirected to a temp file. do the same for file 2 that you are comparing. Then use the command diff to compare the files.

ghostdog74
Why not `awk -F 'Data=' '/^Data=/ { print $2; }'`? And if he is using a recent-ish version of Bash, he can use process substitution `diff <(awk ... file1) <(awk ... file2)`.
janmoesen
+2  A: 

I think you should clarify your question. The answers so far suggest to use awk to get the string after the'='. However, as far as I understand your question, you want to look at all lines from the beginning until the line that starts with 'Data='.

You could use

sed '/^Data=/,$d' file

to delete all lines from the first line that matches '^Data=' to the end and feed the result into diff using the syntax that janmoesen mentioned, e.g.

diff <(sed '/^Data=/,$d' file1) <(sed '/Data=/,$d' file2)
bromfiets