tags:

views:

199

answers:

3

I am trying to run the C files downloaded from here as follows :

gcc main.c docs_file.txt ksg_file.txt

However, I receive the following error:

/usr/bin/ld:docs_file.txt: file format not recognized; treating as linker script
/usr/bin/ld:docs_file.txt:2: syntax error
collect2: ld returned 1 exit status

I am not sure what the problem is.

Update 1:

I get the following errors while compiling:

gcc main.c -o ksg


/tmp/cc4H83rG.o: In function `main':
main.c:(.text+0xa5): undefined reference to `stree_new_tree'
main.c:(.text+0xe0): undefined reference to `stree_add_string'
main.c:(.text+0x2a7): undefined reference to `stree_match'
main.c:(.text+0x38f): undefined reference to `int_stree_set_idents'
main.c:(.text+0x422): undefined reference to `int_stree_get_parent'
main.c:(.text+0x47b): undefined reference to `int_stree_get_suffix_link'
/tmp/cc4H83rG.o: In function `count_freq':
main.c:(.text+0x96d): undefined reference to `int_stree_set_idents'
main.c:(.text+0x9a8): undefined reference to `stree_get_num_leaves'
main.c:(.text+0xa91): undefined reference to `int_stree_set_idents'
/tmp/cc4H83rG.o: In function `select_feature':
main.c:(.text+0xb34): undefined reference to `int_stree_set_idents'
main.c:(.text+0xbe7): undefined reference to `stree_get_num_children'
main.c:(.text+0xc47): undefined reference to `int_stree_get_parent'
main.c:(.text+0xc67): undefined reference to `int_stree_set_idents'
main.c:(.text+0xc94): undefined reference to `int_stree_get_parent'
main.c:(.text+0xdbb): undefined reference to `int_stree_get_suffix_link'
main.c:(.text+0xddb): undefined reference to `int_stree_set_idents'
main.c:(.text+0xe08): undefined reference to `int_stree_get_suffix_link'
collect2: ld returned 1 exit status
A: 

With C, you don't run a program using the compiler. You first compile the C file to a binary (gcc main.c -o binary). Then you execute it (./binary docs_file.txt ksg_file.txt).

Scharron
Thanks Sacharron! John explained it really well.
Denzil
+4  A: 

The tarball you linked to contains source code. To run the code you need to compile it into an executable. You can then run the executable if the compilation succeeds.

Here are the files you should have to start with, directly from the tar file:

$ ls
ksg     main.c      sample_ksgs.txt stree.h
ksg.exe sample_docs.txt stree.c     stree.txt

Compile

First we'll compile the program. The -o ksg names the executable ksg. When gcc displays nothing that means it succeeded without any errors or warnings.

$ gcc -o ksg main.c stree.c

Run

Now we can run the ksg executable we just created. The command-line syntax is ./ksg <arguments>. For example, we can ask for help with ./ksg -?:

$ ./ksg -?
Dell Zhang, Wee Sun Lee.
Extracting Key-Substring-Group Features for Text Classification.
In Proceedings of the 12th ACM SIGKDD International Conference on
   Knowledge Discovery and Data Mining (KDD),
Philadelphia, PA, Aug 2006.

Usage: ksg [options] docs_file ksgs_file

Options:
         -?          -> help
         -s [0,1]    -> assume white-spaces are word delimiters
                        (default 1)
         -l [2..]    -> the minimum frequency
                        (default 2)
         -h [l..]    -> the maximum frequency
                        (default 8,000)
         -b [2..]    -> the minimum number of branches
                        (default 2)
         -p (0..1]   -> the maximum parent-child conditional probability
                        (default 1.0)
         -q (0..1]   -> the maximum suffix-link conditional probability
                        (default 1.0)
Arguments:
         docs_file    -> the input  file with each line as a raw document
         ksgs_file    -> the output file with each line as a bag of ksg features
John Kugelman
John Kugelman, Thanks a ton mate ! I was struggling with this since more than a hour. I will accept and vote your answer up. :-)Pardon me for the naive question. I am more a Java/Python guy ! :-(
Denzil
+1  A: 

Usually, gcc can compile only .c files - which, by convention, hold C code inside them.

.txt files are usually plain text. For human eyes only.

Try

gcc *.c

Michael Foukarakis
Thanks mfukar, John put it really well.
Denzil