views:

79

answers:

7

Hi,

I have some source code file which has mixed tabs/spaces and I want to convert it to a file where it has automatically replaced all indentation spaces by tabs for a given tab space length (i.e. for example tab = 2 spaces).

Any easy solution (with common Unix tools, MacOSX, bash or zsh)? Some sed script or Python command or so?

Thanks, Albert

A: 

You could use a regular expression to replace N spaces by a tab charater. For example in Python:

import re
re.sub('[ ]{4}', '\t', text)
compie
It's not that easy. This for example would not only replace spaces used for indentation but also everywhere else (and it should not do that).
Albert
A: 

Two things,

  1. sed -i is your friend - sed -i XXX.txt 's/^[ ]\{2\}/\t/g'
  2. You can't make regular expression to multiply the tab replacement by the space length.

Given my AWK-fu is not strong (and I don't know if it can do what #2 can't), I will write a PHP script to calculate the spaces and replace them with tabs.

yclian
Ok, that will at least only replace spaces in the beginning. Though it will not replace it multiple times. I probably will write a Python script which does it for me.
Albert
A: 
sed -r 's/ {2}/\t/g' file
Ed
It's not that easy. This for example would not only replace spaces used for indentation but also everywhere else (and it should not do that).
Albert
A: 

Depending on the source language, you could try out GNU indent. It can do a large number of things relating to the indentation of source code, though it might be more complex than you need.

For example, if I give the following program to indent -di0 <inputfile>

#include <stdio.h>

int main(int argc, char **argv)
{
  int i;
    int j;
  for (i = 0; i < 10; i++)
    {
        for (j = 0; j < 10; j++)
    {
        printf("x");
    }
  }
}

It will replace it with:

#include <stdio.h>

int 
main(int argc, char **argv)
{
    int i;
    int j;
    for (i = 0; i < 10; i++) {
        for (j = 0; j < 10; j++) {
            printf("x");
        }
    }
}

Or, if you need something stupid simple, there is the expand/unexpand commands.

mjschultz
`indent` doesn't work (it's Python -- tough I'm also searching for a solution which works in other cases too). `expand`/`unexpand` is too simple (basically like most other solutions given here). :)
Albert
Perhaps the reindent.py script at http://svn.python.org/projects/python/trunk/Tools/scripts/reindent.py will give a basis for what you need then?
mjschultz
Hey, that reindent.py looks mostly like what I wanted. :) Well, didn't looked much into it, not sure if it is only Python-only (which would have helped me right now but would not have been the general solution I was searching for). Coded it myself now...
Albert
A: 

Here is a possible solution in Python:

import re
import fileinput

pat = re.compile("^(  )+")

for line in fileinput.input(inplace=True):
    print pat.sub(lambda m: "\t" * (m.end() // 2), line, 1),
Philipp
Ok, better solution than the others but it will not work if there are mixed spaces/tabs already. Sth. like `"\t \t"` should become `"\t" * 3`.
Albert
Does it work if you replace the regex with `"^( |\t)+"`? I think I don't exactly understand the requirements. __EDIT:__ that is *two* spaces in the new regex, the inline code markup unfortunately collapses spaces.
Philipp
A: 

Ok, none of the given solutions satisfied me, so I coded it myself. :)

See here:

Albert
A: 

This will convert leading spaces (even interspersed with tabs) into tabs. Specify the number of spaces to convert by setting the variable. Stray spaces will be collapsed to nothing. Spaces and tabs that appear after any character other than space or tab will not be touched.

tstop=2
sed "s/^\([[:blank:]]*\)\(.*\)/\1\n\2/;h;s/[^[\n]*//;x;s/\n.*//;s/ \{$tstop\}/X/g;s/ //g;G;s/\n//g" inputfile

Example:

[space][space][tab][tab][space][space][space][tab][space]TEXT[space][space][space]

will be converted to

[tab][tab][tab][tab][tab]TEXT[space][space][space]

If that's not exactly what you're looking for, adjustments can be made.

Dennis Williamson
Totally unreadable but looks like what I was searching for. :) Btw., shouldn't it be `...[tab][space]TEXT` in the output? At least that is what I want.
Albert
I'm removing all stray spaces. What would you want (for `tstop=2`) `[tab][space][tab]...TEXT` to look like? What about `[tab][space][space]TEXT`?
Dennis Williamson
`[t][s][t]text` should become `[t][t]text`. `[t][s][s]text` should become `[t][t]text`. `[t][s]text` should stay the same.
Albert