tags:

views:

49

answers:

2

Hi

Using VB or C#, I am getting a string of variable length from the database. This information is sensitive information that only certain users will be able to see.

I have two cases that will use the same logic (I think).

scenario 1: replace all characters with x

scenario 2: replace all characters with x except the last 4 characters (assume length > 4 - this check is being done).

I thought that this would be easiest using Regex.Replace(input, pattern, replacestring). As opposed to a lot of string handling with substrings and forcing a length of 'x's.

But it seems that Regex will always be my kryptonite.

Any help from the regex gurus would be appreciated. Alternatively a better solution would be welcome.

+3  A: 

I'm not convinced that regular expressions are the best approach here, but these should work.

ReplaceWithX replaces every single character (specified by .) with an x.

ReplaceWithXLeave4 replaces all but the last four characters with an x. It does this by matching any single character (.) while using a zero-width negative lookahead assertion to throw out this match for the last four characters.

using System;
using System.Text.RegularExpressions;

namespace ReplaceRegex
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(ReplaceWithX("12345678"));
            Console.WriteLine(ReplaceWithXLeave4("12345678"));
        }

        static string ReplaceWithX(string input)
        {
            return Regex.Replace(input, ".", "x");
        }

        static string ReplaceWithXLeave4(string input)
        {
            return Regex.Replace(input, ".(?!.{0,3}$)", "x");
        }
    }
}

And for completeness, below is what it looks like when not using regular expressions. This approach is probably quite a bit faster than the regex approach, even though you might not ever see the perf difference when just doing it once or twice like these examples are. In other words, if you're doing this on a server with lots of requests, avoid regex since it's only marginally easier to read.

using System;
using System.Text;

namespace ReplaceNoRegex
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(ReplaceWithX("12345678"));
            Console.WriteLine(ReplaceWithXLeave4("12345678"));
        }

        static string ReplaceWithX(string input)
        {
            return Repeat('x', input.Length);
        }

        static string ReplaceWithXLeave4(string input)
        {
            if (input.Length <= 4)
                return input;

            return Repeat('x', input.Length - 4)
                 + input.Substring(input.Length - 4);
        }

        static string Repeat(char c, int count)
        {
            StringBuilder repeat = new StringBuilder(count);

            for (int i = 0; i < count; ++i)
                repeat.Append(c);

            return repeat.ToString();
        }
    }
}
Chris Schmich
Hi Chris. Thanks for the answer.
Kamal
A: 

It's worth pointing out here (especially given the "hide all but last four characters" point) that sensitive information can also be given away by its length.

For example, if I know someone reasonably well, finding out the length of their password may be sufficient for me to be able to have a good guess at what password they've used; it would certainly be enough to help narrow down the the possibilities.

Therefore I would suggest rather than simply replacing characters with the same number of 'x's, you should replace with a fixed number of 'x's, so the length cannot be guessed.

This then becomes quite a simple case of string-replacement. There's really no need for regex at all.

In the case of replacing the whole string, just dislpay "xxxxxxxx", regardless of what the original string was (or whatever length of 'x's you prefer).

And in the case of showing the last four characters, just output a shorter string of 'x's followed by the last four characters using substring().

Spudley
Good point spudley. But this isn't for password lengths. It's more for contact details where certain departments are allowed to see it while other's aren't. The last four digits allowed, in this case, is the postal/zip code which everyone is entitled to see. Business rules.
Kamal
Why not then just allow them to see the last four characters only?
El Ronnoco
Like I said, business rules. I don't make em, I just code em :)
Kamal