tags:

views:

328

answers:

1

Hey, i'm working on a project and i have some problems.I have searched but can not find any satisfied answer.

i have a huge file consists of 0 and 1s. i'm getting 1024 (my chunk) bits into an array chunk and after that i apply SHA1() function which is implemented in openssl/sha.h library.

char chunk[1024]; while((fgets(chunk,1024,fp))!=NULL)

My intention is that my file can consist of same chunks and i want to count how many chunks are same.

After i get 1024 bits in my array chunk i apply :

unsigned char obuf[20];

SHA1(chunk,strlen(chunk), obuf); function to get the result of the hash function.

here how SHA1 function works

unsigned char *SHA1(const unsigned char *d, unsigned long n,unsigned char *md);

after that i want to store my hash function result in an array.After i read all of my file, i'll use this array to compare if there are same hash results or not, by this way i can start my project.But i stuck in this point. i can not put the result obuf to an array.

i tried : memcopy() strcopy() or just myarray[N][20]=obuf; etc.

if you suggest some way, i'll be happy thanks.

so the biggest problem is that finding the how many hashes are unique ?

+1  A: 

Firstly, you say that your chunks of input file are of size 1024 - however this line will read at most 1023 characters from your file (it uses one space for the null terminator):

char chunk[1024]; while((fgets(chunk,1024,fp))!=NULL)

(I think fread might well be closer to what you're trying to do here)

Secondly, you can just do something like:

#define MAX_CHUNKS 1000

unsigned char chunk[1024];
unsigned char obuf[MAX_CHUNKS][20];
int chunk_n = 0;

while (fread(chunk, sizeof chunk, 1, fp) > 0 && chunk_n < MAX_CHUNKS)
{
    SHA1(chunk, sizeof chunk, obuf[chunk_n++]);
}

/* Now have chunk_n SHA1s stored in obuf[0] through obuf[chunk_n -1] */
caf
i'm just trying it, it makes sense. You are right this line will read at most 1023 but i can modify my problem is just putting it into array...
berkay
You will need to use the function `memcmp(obuf[i], obuf[j], 20)` to compare the hashes. `memcmp` will return `0` if the two hashes are equal. I think you might need to brush up on your C basics...
caf
it's not as easy as you think, cuz let's think my hashes are [5,4,5,4,3,3,2,1,5,2] it's hard to implement the function using memcmp.(beginner in c)
berkay
i'm trying to find unique elements in array, not just comparing ...
berkay
Are you sure you want to use C? containers with unique keys and so on are a lot easier in e.g. python or perl. Or even C++. In C, the easiest thing is a naive linear search, but a hash table would prob. be faster.
Peter Cordes
It's not as complicated as you think. Finding unique elements IS comparing, at least to begin with. caf definitely has pointed you in the right direction.
Dustin Fineout