If you're using Hadoop Streaming, your input can be in any line-based format; your mapper and reducer input comes from sys.stdin, which you read any way you want. You don't need to use the default tab-deliminated fields (although in my experience, one format should be used among all tasks for consistency when possible).
However, with the default splitter and partitioner, you cannot control how your input and output is partitioned or sorted, so you your mappers and reducers must decide whether any particular line is a header line or a data line using only that line - they won't know the original file boundaries.
You may be able to specify a partitioner which lets a mapper assume that the first input line is the first line in a file, or even move away from a line-based format. This was hard to do the last time I tried with Streaming, and in my opinion mapper and reducer tasks should be input agnostic for efficiency and reusability - it's best to think of a stream of input records, rather than keeping track of file boundaries.
Another option with Streaming is to ship header information in a separate file, which is included with your data. It will be available to your mappers and reducers in their working directories. One idea would be to associate each line with the appropriate header information in an inital task, perhaps by using three fields per line instead of two, rather than associating them by file.
In general, try and treat the input as a stream and don't rely on file boundaries, input size, or order. All of these restrictions can be implemented, but at the cost of complexity. If you do need to implement them, do so at the beginning or end of your task chain.
If you're using Jython or SWIG, you may have other options, but I found those harder to work with than Streaming.