tags:

views:

808

answers:

11

I have a string value that its length is 5000 + characters long , i want to split this into 76 characters long with a new line at the end of each 76 characters. how woudld i do this in c#?

+1  A: 
string[] FixedSplit(string s, int len)
{
   List<string> output;
   while (s.Length > len)
   {
      output.Add(s.Substring(0, len) + "\n");
      s.Remove(0, len);
   }
   output.Add(s + "\n");
   return output.ToArray();
}
Aric TenEyck
...with the provision that if you wanted a single string back, you would have to spin through the returned array and mash all of the strings back together.
Robert Harvey
How would i do that, as i do need 1 string back?
MartGriff
string.Join("\r\n",output) + "\r\n"
Matthew Whited
That should work too, but you don't need the \r\n's, as newlines have already been put in by Aric.
Robert Harvey
Either those weren't there or I just missed it...
Matthew Whited
By the way this doesn't work
Matthew Whited
+4  A: 

A little uglier ... but much faster ;) (this version took 161 ticks... Aric's took 413)

I posted my test code on my blog. http://hackersbasement.com/?p=134 (I also found StringBuilder to be much slower than string.Join)

http://hackersbasement.com/?p=139 <= updated results

    string chopMe = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    Stopwatch sw = new Stopwatch();

    sw.Start();
    char[] chopMeArray = chopMe.ToCharArray();
    int totalLength = chopMe.Length;
    int partLength = 12;
    int partCount = (totalLength / partLength) + ((totalLength % partLength == 0) ? 0 : 1);
    int posIndex = 0;
    char[] part = new char[partLength];
    string[] parts = new string[partCount];
    int get = partLength;
    for (int i = 0; i < partCount; i++)
    {
        get = Math.Min(partLength, totalLength - posIndex);
        Array.Copy(chopMeArray, posIndex, part, 0, get);
        parts[i] = new string(part, 0, get);
        posIndex += partLength;
    }

    var output = string.Join("\r\n", parts) + "\r\n";
    sw.Stop();
    Console.WriteLine(sw.ElapsedTicks);
Matthew Whited
what do you think if my answer?
Fredou
I've updated my answer - you might want to run your benchmark again ;)
Joel Coehoorn
Also, it's " StopWatch sw = StopWatch.StartNew(); "
Joel Coehoorn
updated your's and Alan's on to my post
Matthew Whited
could you look again?
Fredou
I don't see it yet. Anyway, if I'm doing the benchmark correctly mine runs in about 40 ticks on average.
Joel Coehoorn
I updated my site again. (the ticks are the comments next to the Console.WriteLine()s) ... and Joel you are right I wasn't resetting the timer. This shows StringBuilder versions to be the fastest now.
Matthew Whited
I have tried to keep them all in line but i will change them back to methods shortly and run it again
Matthew Whited
Okay, you mean on the blog post you linked. I see it now. I don't think you really got the point of the code I posted. Note that I _never_ concatenate any strings. It's all copying from buffer to buffer, and I create the result by calling the string constructor directly.
Joel Coehoorn
Another suggestion: you really need two sets of benchmarks. One with a shorter string (say, 40 characters) and another with a very long string (say, 2000 characters).
Joel Coehoorn
Yeah I was trying to make that part similar for all of the tests to remove it from the picture. But I am working on a cleaner version that is pretty much a direct copy of each persons code shortly
Matthew Whited
@Matthew, Look my latest edit again, this one should be way faster :-)
Fredou
what type of computer do you have? On my computer, taking what you have on your website and switching my old code with my new code from this thread I get: #1 39464, #2 48784, #341160, #4 45552, #5 8000, #6 1109656 (running inside visual studio, debugging mode)
Fredou
Windows Vista Ultimate, AMD Althon 64 X2 Dual Core 6000+ (3.0 GHz), 4GB RAM (Dual Core 64 bit) ... with a bunch of crap running at the same time
Matthew Whited
@Fredou: 1109656 is a _long_ time in computer terms. You're doing something wrong with the benchmarks.
Joel Coehoorn
@Joel, I have an Intel processor? maybe that is why...I just copy/pasted into VS and ran it
Fredou
@Matthew, again, I updated my answer and it's about 40% faster than the previous one on my computer. If you want to updated your bench on your website, go ahead.
Fredou
+1  A: 
public static string InsertNewLine(string s, int len)
{
 StringBuilder sb = new StringBuilder(s.Length + (int)(s.Length/len) + 1);
 int start = 0;
 for (start=0; start<s.Length-len; start+=len)
 {
  sb.Append(s.Substring(start, len));
  sb.Append(Environment.NewLine);
 }
 sb.Append(s.Substring(start));
 return sb.ToString();
}

where s would be your input string and len the desired line length (76).

M4N
+1 for being the canonical solution, and EXACTLY what was asked for.
Robert Harvey
The best way to use line breaks is using: Environment.NewLine, instead of "\n".
Zanoni
Thanks for the reminder, I updated the answer accordingly.
M4N
+1  A: 
public static IEnumerable<string> SplitString(string s, int length)
{
    var buf = new char[length];
    using (var rdr = new StringReader(s))
    {
        int l;
        l = rdr.ReadBlock(buf, 0, length);
        while (l > 0)
        {
            yield return (new string(buf, 0, l)) + Environment.NewLine;
            l = rdr.ReadBlock(buf, 0, length);
        }
    }
}

Then to put them back together:

string theString = GetLongString();
StringBuilder buf = new StringBuilder(theString.Length + theString.Length/76);
foreach (string s in SplitString(theString, 76) { buf.Append(s); }
string result = buf.ToString();

Or you could do this:

string InsertNewLines(string s, int interval)
{
    char[] buf = new char[s.Length + (int)Math.Ceiling(s.Length / (double)interval)];

    using (var rdr = new StringReader(s))
    {
        for (int i=0; i<buf.Length-interval; i++)
        {
            rdr.ReadBlock(buf, i, interval);
            i+=interval;
            buf[i] = '\n';
        }
        if (i < s.Length)
        {
            rdr.ReadBlock(buf, i, s.Length - i);
            buf[buf.Length - 1] = '\n';
        }
    }
    return new string(buf);
}
Joel Coehoorn
This would handle large files better. But it's pretty slow
Matthew Whited
The main advantage is that this makes it easy to change the function to accept a plain stream. Then you could pass something like a file stream to it and maybe never load the entire string in memory initially in the first place.
Joel Coehoorn
Added a new method that put the result directly into a buffer - should be very fast.
Joel Coehoorn
@Joel, from Matthew post you said I had wrong ticks, you were right and I found out why: http://stackoverflow.com/questions/1017608/about-the-stopwatch-elapsedticks
Fredou
+3  A: 
Chris S
+3  A: 

Try this:

s = Regex.Replace(s, @"(?<=\G.{76})", "\r\n");

EDIT: Apparently, this is the slowest method of all those posted so far. I wonder how it does if you pre-compile the regex:

Regex rx0 = new Regex(@"(?<=\G.{76})");

s = rx0.Replace(s, "\r\n"); // only time this portion

Also, how does it compare to a straight matching approach?

Regex rx1 = new Regex(".{76}");

s = rx1.Replace(s, "$0\r\n"); // only time this portion

I've always wondered how expensive those unbounded lookbehinds are.

Alan Moore
+1 for doing it in one line of code.
Robert Harvey
:D One-liners: the programmers' version of haiku.
Alan Moore
Alan, your first version wasn't as slow as I originally said. Joel pointed out I wasn’t resetting the timer. You can check you time on my blog post. And I will shortly be making a new post with everyone’s code refactored into methods.
Matthew Whited
Cool. But you've got a copy/paste error there: AlanM_3() is doing exactly the same thing as AlanM_1().
Alan Moore
A: 

In the end, this would be what I would use, I think

    static string fredou()
    {
        string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        int partLength = 12;

        int stringLength = s.Length;
        StringBuilder n = new StringBuilder(stringLength + (int)(stringLength / partLength) + 1);
        int chopSize = 0;
        int pos = 0;

        while (pos < stringLength)
        {
            chopSize = (pos + partLength) < stringLength ? partLength : stringLength - pos;
            n.Append(s , pos, chopSize);
            n.Append("\r\n");
            pos += chopSize;
        }

        return n.ToString();         
    }

by looking at AppendLine under reflector:

    <ComVisible(False)> _
    Public Function AppendLine(ByVal value As String) As StringBuilder
        Me.Append(value)
        Return Me.Append(Environment.NewLine)
    End Function

    Public Shared ReadOnly Property NewLine As String
        Get
            Return ChrW(13) & ChrW(10)
        End Get
    End Property

For me, speed wise, doing it manually > AppendLine

Fredou
Well you won the slowest (279097 ticks)
Matthew Whited
I fibbed... Alan's is slower
Matthew Whited
Yes, definitely the regex wins!
Robert Harvey
The first for loop doesn't count for time in Fredou's solution, since it builds the test string. Is it still 270K ticks? I read the first for loop as 7000 new string initializations.
Robert Harvey
(I really don't think you want to do this the way you did... but we will see)
Matthew Whited
Use AppendLine() instead of calling Append() twice. Also becasue of the way .Net handles strings you will probably see a decrease in memory usage as well. I have to call it a night but feel free to post your results as a comment on my blog.
Matthew Whited
A: 

One more.... (first time through slowish, subsequent runs, similar to the faster times posted above)

private void button1_Click(object sender, EventArgs e)
{
  string chopMe = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
  Stopwatch sw = new Stopwatch();
  sw.Start();
  string result = string.Join("\r\n", ChopString(chopMe).ToArray());
  sw.Stop();
  MessageBox.Show(result + " " + sw.ToString());
}


public IEnumerable<string> ChopString(string s)
{
  int i = 0;
  while (i < s.Length)
  {
    yield return i + PARTLENGTH <= s.Length ? s.Substring(i,PARTLENGTH) :s.Substring(i) ;
    i += PARTLENGTH;
  }
}

Edit: I was curious to see how fast substring was...

Tim Jarvis
You make a good point. for me next test I will loop them all twice. The way string work the first to parse would the the slowest.
Matthew Whited
+12  A: 

If you're writing Base64 data, try writing

Convert.ToBase64String(bytes, Base64FormattingOptions.InsertLineBreaks);

This will insert a newline every 76 characters

SLaks
I'll be damned, so it will. Should be fast too, uses unsafe code. ;)
Robert Harvey
well hell... how to I compare this to the rest? ...
Matthew Whited
pretty sure this is the winner becasue the others would have to run though this converter anyway
Matthew Whited
A: 

The string is 5000 characters... I don't think speed is really of the essence unless you're doing this thousands or maybe even millions of times, especially when the OP didn't even mention speed being important. Premature optimization?

I would probably use recursion as it will, in my opinion, lead to the simplest code.

This may not be syntatically correct, as I know .NET but not C#.

String ChunkString(String s, Integer chunkLength) {
    if (s.Length <= chunkLength) return s;
    return String.Concat(s.Substring(0, chunkLength), 
                         ChunkString(s.Substring(chunkLength)));
}
Daniel Straight
Yeah, we know that; we're only doing the benchmarking to satisfy our curiosity.
Alan Moore
A: 

mostly for the fun of it, here's a different solution implemented as extension method to string: (\r\n used explicitly so will only support that format for newline);

public static string Split(this string str, int len)
        {
            char org = str.ToCharArray();
            int parts = str.Length / len + (str.Length % len == 0 ? 0 : 1);
            int stepSize = len + newline.Length;
            char[] result = new char[parts * stepSize];
            int resLen = result.Length;

            for (int i =0;i<resLen ;i+stepSize)
            {
                Array.Copy(org,i*len,result,i*stepSize);
                resLen[i++] = '\r';
                resLen[i++] = '\n';
            }
            return new string(result);
        }
Rune FS