views:

237

answers:

2

I'm reading the well known book "The C programming Language, 2nd edition" and there is one exercise that I'm stuck with. I can't figure out what exactly needs to be done, so I would like for someone to explain it to me.

It's Exercise 5-17:

  • Add a field-searching capability, so sorting may be done on fields within lines, each field sorted according to an independent set of options.

What does the input program expect from the command line; what does it mean by "independent set of options"?

+2  A: 

It's referring to the ability to specify subfields in each row to sort by. For example:

sort -f1:4a -f20:28d somefile.txt

would sort the field beginning at character position 1 and extending to position4 ascending and within that sort the field beginning at position 20 and extending to 28 descending.

Of course, there are lots of other ways to specify fields, sort order etc. Designing the command line switches is one of the points of the exercise, IMHO.

anon
What do you mean "within that sort" exactly?Can you please write a few lines and than write them sorted if command line input is, let's say like this:sort -f1:4a -f7:11a somefile.txt
paleman
+2  A: 

Study the POSIX sort utility, ignoring the legacy options. Or study the GNU sort program; it has even more options than POSIX sort does.

You need to decide between fixed-width fields as suggested by Neil Butterworth in his answer and variable-width fields. You need to decide on what character separates variable-width fields. You need to decide on which sorting modes to support for each field (string, case-folded string, phone-book string, integer, floating point, date, etc) as well as sort direction (forward/reverse or ascending/descending).

The 'independent options' means that you can have different sort criteria for different fields. That is, you can arrange for field 1 to be sorted in ascending string order, field 3 to be sorted in descending integer order, and field 9 to be sorted in ascending date order.

Note that when sorting, the primary criterion is the first key field specified. When two rows are compared, if there is a difference between the first key field in the two rows, then the subsequent key fields are never considered. When two rows are the same in the first key field, then the criterion for the second key field determines the relative order; then, if the second key fields are the same, the third key field is consulted, and so on. If there are no more key fields specified, then the usual default sort criterion is "the whole line of input in ascending string order". A stable sort preserves the relative order of two rows in the original data that are the same when compared using the key field criteria (instead of using the default, whole-line comparison).

Jonathan Leffler
Thank you Jonathan and Neil Butterworth it's much clearer now.Just one sub question: if I choose variable width fields how do I specify them in the command line?
paleman