views:

1937

answers:

11

I am building a text parser using regular expressions. I need to convert all tab characters in a string to space characters. I cannot assume how many spaces a tab should encompass otherwise I could replace a tab with, say, 4 space characters. Is there any good solution for this type of problem. I need to do this in code so I cannot use an external tool.

Thanks!

A: 

you can use the replace function:

char tabs = '\u0009';
String newLine = withTabs.Replace(tab.ToString(), "    ");
Miyagi Coder
sounds like he wants the results to still line up on tab stops
Joel Coehoorn
A: 

You want to be able to convert a tab to N spaces? One quick and dirty option is:

output = input.Replace("\t", "".PadRight(N, (char)" "));

Obviously N has to be defined somewhere, be it user input or elsewhere in the program.

Ian Jacobs
+1  A: 
Regex.Replace(input,"\t","    ");
DannySmurf
+6  A: 

Unfortunately, you need to assume how many spaces a tab represents. You should set this to a fixed value (like the mentioned 4) or make it a user option.

The quickest way to do this is .Net is (I'm using C#):

var NewString = "This is a string with a    Tab";
var TabLength = 4;
var TabSpace = new String(' ', TabLength);

NewString = NewString.Replace("\t", TabSpace);

You can then change the TabLength variable to anything you want, typically as mentioned previously, 4 space characters.

Tabs in all operating systems are the same length, 1 tab! What differs is the way software displays them, typically this is the equivalent width of 4 space characters and this also assumes that the display is using a fixed width font such as Courier New.

For example, my IDE of choice allows me to change the width of the tab character to a value that suits me.

GateKiller
Tabs account for UP TO TabSpace characters, not exactly that many characters.
Joel Coehoorn
+2  A: 

I'm not really sure what you mean by "I cannot assume how many spaces a tab should encompass", but this example will replace tabs with any number of spaces you specify.

public static string ReplaceTabs(string value, int numSpaces)
{
   string spaces = new String(' ', numSpaces);
   return value.Replace("\t", spaces);     
}
Rick
+1  A: 

I think what you mean to say is you'd like to replace tabs with the effective amount of spaces they were expanded to. The first way that comes to mind doesn't involve regular expressions (and I don't know that this problem could be solved with them).

  • Step through the string character by character, keeping track of your current position in the string.
  • When you find a tab, replace it with N spaces, where N = tab_length - (current_position % tab_length).
  • Add N to your current position and continue though the string.
Nick McCowin
A: 

Unfortunately, none of these answers address the problem with which I am encountered. I am extracting text from external text files and I cannot assume how they were created or which operating system was used to create them. I believe the length of the tab character can vary so if I encounter a tab when I am reading the text file, I want to know how many space characters I should replace it with. Thank you for the replies, however.

Arsalan Ahmed
Given these conditions, your problem has no simple solution. Figure out a couple of files "by hand", write down how YOU did it, implement that.
mjfgates
A: 

I'm not sure how tabs will read in from a Unix text file, or whatever your various formats are, but this works for inline text. Perhaps it will help.

var textWithTabs = "some\tvalues\tseperated\twith\ttabs";
var textWithSpaces = string.Empty;

var textValues = textWithTabs.Split('\t');

foreach (var val in textValues)
{
    textWithSpaces += val + new string(' ', 8 - val.Length % 8);
}

Console.WriteLine(textWithTabs);
Console.WriteLine(textWithSpaces);
Console.Read();
ckal
A: 

I think everyone has covered it, but a tab character is just that. one character.. the character is represented by \t .. each application can choose to display it with one space, two spaces, 4 spaces, A smiley.. whatever.. so.. there's no real answer to this.

Rob
A: 

Hey Joel Coehoorn, I'm having the same problem! - Those people here are so stupid, they don't know what means tab. It's frustrate!!

A: 

This is exactly what they are talking about needing. I wrote this back in VB6. I made a few quick VB2010 updates, but it could use some better fixing up for it. Just be sure and set the desired tab width, it's set to 8 in there. Just send it the string, or even fix them right inside the textbox like so

RichTextBox1.Text = strFixTab(RichTextBox1.Text)

Function strFixTab(ByVal TheStr As String) As String
    Dim c As Integer
    Dim i As Integer
    Dim T As Integer
    Dim RetStr As String
    Dim ch As String
    Dim TabWidth as Integer = 8    ' Set the desired tab width

    c = 1
    For i = 1 To TheStr.Length
        ch = Mid(TheStr, i, 1)
        If ch = vbTab Then
            T = (TabWidth + 1) - (c Mod TabWidth)
            If T = TabWidth + 1 Then T = 1
            RetStr &= Space(T)
            c += T - 1
        Else
            RetStr &= ch
        End If
        If ch = vbCr Or ch = vbLf Then
            c = 1
        Else
            c += 1
        End If
    Next
    Return RetStr
End Function
BillHudson007