views:

98

answers:

3

I need to strip the "label" off the front of strings, e.g.

note: this is a note

needs to return:

note

and

this is a note

I've produced the following code example but am having trouble with the regexes.

What code do I need in the two ???????? areas below so that I get the desired results shown in the comments?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("She said this to him: follow me."); //this is NOT a label since there is a space before the colon
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");

            foreach (var line in lines)
            {
                Console.WriteLine(StringHelpers.GetLabelFromLine(line));
                Console.WriteLine(StringHelpers.StripLabelFromLine(line));
                Console.WriteLine("--");
                //note
                //this is a note
                //--
                //test
                //just a test
                //--
                //test
                //just a test
                //--
                //firstName
                //Jim
                //--
                //
                //She said this to him: follow me.
                //--
                //description
                //this is the first description
                //--
                //description
                //this is the first description
                //--
                //
                //this is a line with no label
                //--

            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static string GetLabelFromLine(this string line)
        {
            string label = line.GetMatch(@"^?:(\s)"); //???????????????
            if (!label.IsNullOrEmpty())
                return label;
            else
                return "";
        }

        public static string StripLabelFromLine(this string line)
        {
            return ...//???????????????
        }

        public static bool IsNullOrEmpty(this string line)
        {
            return String.IsNullOrEmpty(line);
        }
    }

    public static class RegexHelpers
    {
        public static string GetMatch(this string text, string regex)
        {
            Match match = Regex.Match(text, regex);
            if (match.Success)
            {
                string theMatch = match.Groups[0].Value;
                return theMatch;
            }
            else
            {
                return null;
            }
        }
    }
}

Added

@Keltex, I incorporated your idea as follows but it is not matching any of the text (all entries are blank), what do I need to tweak in the regex?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("first name: Jim"); //"first name" is not a label because it contains a space
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");

            foreach (var line in lines)
            {
                LabelLinePair llp = line.GetLabelLinePair();
                Console.WriteLine(llp.Label);
                Console.WriteLine(llp.Line);
                Console.WriteLine("--");
            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static LabelLinePair GetLabelLinePair(this string line)
        {
            Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
            Match match = regex.Match(line); 
            LabelLinePair labelLinePair = new LabelLinePair();
            labelLinePair.Label = match.Groups["label"].ToString();
            labelLinePair.Line = match.Groups["line"].ToString();
            return labelLinePair;
        }
    }

    public class LabelLinePair
    {
        public string Label { get; set; }
        public string Line { get; set; }
    }

}

SOLVED:

Ok, I got it to work, plus added a little hack to take care of the labels with spaces and it's exactly what I wanted, THANKS!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("first name: Jim"); //"first name" is not a label because it contains a space
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");
            lines.Add("she said to him: follow me");

            foreach (var line in lines)
            {
                LabelLinePair llp = line.GetLabelLinePair();
                Console.WriteLine(llp.Label);
                Console.WriteLine(llp.Line);
                Console.WriteLine("--");
            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static LabelLinePair GetLabelLinePair(this string line)
        {
            Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
            Match match = regex.Match(line); 
            LabelLinePair llp = new LabelLinePair();
            llp.Label = match.Groups["label"].ToString();
            llp.Line = match.Groups["text"].ToString();

            if (llp.Label.IsNullOrEmpty() || llp.Label.Contains(" "))
            {
                llp.Label = "";
                llp.Line = line;
            }

            return llp;
        }

        public static bool IsNullOrEmpty(this string line)
        {
            return String.IsNullOrEmpty(line);
        }
    }

    public class LabelLinePair
    {
        public string Label { get; set; }
        public string Line { get; set; }
    }

}
+5  A: 

Can't you simply split the string on the first colon, or if there's no colon there's no label?

public static class StringHelpers 
{ 
    public static string GetLabelFromLine(this string line) 
    { 
         int separatorIndex = line.IndexOf(':');
         if (separatorIndex > 0)
         {
            string possibleLabel = line.Substring(0, separatorIndex).Trim();
            if(possibleLabel.IndexOf(' ') < 0) 
            {
                return possibleLabel;
            }
         }
         else
         {
            return string.Empty;
         }        
     } 

    public static string StripLabelFromLine(this string line) 
    { 
        int separatorIndex = line.IndexOf(':');
         if (separatorIndex > 0)
         {
            return line.Substring(separatorIndex + 1, 
                   line.Length - separatorIndex - 1).Trim();
         }
         else
         {
            return line;
         }      
    } 

    public static bool IsNullOrEmpty(this string line) 
    { 
        return String.IsNullOrEmpty(line); 
    } 
} 
jball
yes I could use split instead of regex but I want to be able to later improve this code so that I can pull off specific labels with prefixes, e.g. ">note", "?note" and "*note" might mean three different things, and hence I want to get a regex code base so I can just tweak the regex and not create more and more complicated code with split, etc.
Edward Tanguay
Easy enough to do with this. Just have your method take in a regex or string to check against the label that is found in the string.
jball
Regex feels like overkill to me too. Search for the first colon, do the pattern-matching on the labels later if you find you really need to.
n8wrl
+1  A: 

It would probably look like this:

Regex myreg = new Regex(@"(?<label>.+):\s*(?<text>.+)");

Match mymatch = myreg.Match(text); 

if(mymatch.IsMatch) 
{ 
    Console.WriteLine("label: "+mymatch.Groups["label"]); 
    Console.WriteLine("text: "+mymatch.Groups["text"]); 
}

I used named matches above, but you could do without them. Also, I think this is a little more efficient than doing two method calls. One regex gets both the text and the label.

Keltex
I incorporated this idea above but it is not matching any text on that regex, what do I need to change on the regex?
Edward Tanguay
The \ should be escaped, that is "(?<label>.+):\\s*(?<text>.+)". Note: it will incorrectly match something with a space as a label.
jball
@Edward... see my comment above. You have a little typo.
Keltex
@jbal... Thanks. Used the @ in front of the string.
Keltex
+1  A: 

This regex works (see it in action on rubular):

(?: *([^:\s]+) *: *)?(.+)

This captures the label, if any, into \1, and the body into \2.

It has plenty of allowance for whitespaces, so labels can be indented, etc.

polygenelubricants