I have been working with a string[]
array in C# that gets returned from a function call. I was wondering what the best way to remove duplicates from this array would be? I could possibly cast to a Generic collection, but I was wondering if there was a better way to do it, possibly by using a temp array?
views:
19791answers:
14If you needed to sort it, then you could implement a sort that also removes duplicates.
Kills two birds with one stone, then.
If you are working in .NET Framework 3.x you could use array.Distinct().
This is a feature of LINQ btw :)
Otherwise, try HashSet<string> :)
You could possibly use a LINQ query to do this:
int[] s = { 1, 2, 3, 3, 4};
int[] q = s.Distinct().ToArray();
unfortunately I don't have linq (i should have mentioned it's c# 2.0 in the post). The array is not guaranteed to be sorted and I don't really have a need to sort it, I just need to make sure it's distinct because I want to bind it to a drop down list and don't want the items to appear twice
Add all the strings to a dictionary and get the Keys property afterwards. This will produce each unique string, but not necessarily in the same order your original input had them in.
If you require the end result to have the same order as the original input, when you consider the first occurance of each string, use the following algorithm instead:
- Have a list (final output) and a dictionary (to check for duplicates)
- For each string in the input, check if it exists in the dictionary already
- If not, add it both to the dictionary and to the list
At the end, the list contains the first occurance of each unique string.
Make sure you consider things like culture and such when constructing your dictionary, to make sure you handle duplicates with accented letters correctly.
The following tested and working code will remove duplicates from an array. You must include the System.Collections namespace.
string[] sArray = {"a", "b", "b", "c", "c", "d", "e", "f", "f"};
var sList = new ArrayList();
for (int i = 0; i < sArray.Length; i++) {
if (sList.Contains(sArray[i]) == false) {
sList.Add(sArray[i]);
}
}
var sNew = sList.ToArray();
for (int i = 0; i < sNew.Length; i++) {
Console.Write(sNew[i]);
}
You could wrap this up into a function if you wanted to.
This might depend on how much you want to engineer the solution - if the array is never going to be that big and you don't care about sorting the list you might want to try something similar to the following:
public string[] RemoveDuplicates(string[] myList) {
System.Collections.ArrayList newList = new System.Collections.ArrayList();
foreach (string str in myList)
if (!newList.Contains(str))
newList.Add(str);
return (string[])newList.ToArray(typeof(string));
}
NOTE : NOT tested!
string[] test(string[] myStringArray)
{
List<String> myStringList = new List<string>();
foreach (string s in myStringArray)
{
if (!myStringList.Contains(s))
{
myStringList.Add(s);
}
}
return myStringList.ToString();
}
Might do what you need...
EDIT Argh!!! beaten to it by rob by under a minute!
Here is the HashSet approach:
public static string[] RemoveDuplicates(string[] s)
{
HashSet<string> set = new HashSet<string>(s);
string[] result = new string[set.Count];
set.CopyTo(result);
return result;
}
List<String> myStringList = new List<string>(); foreach (string s in myStringArray) { if (!myStringList.Contains(s)) { myStringList.Add(s); } }
This is O(n^2), which won't matter for a short list which is going to be stuffed into a combo, but could be rapidly be a problem on a big collection.
Here is a O(n*n) approach that uses O(1) space.
void removeDuplicates(char* strIn)
{
int numDups = 0, prevIndex = 0;
if(NULL != strIn && *strIn != '\0')
{
int len = strlen(strIn);
for(int i = 0; i < len; i++)
{
bool foundDup = false;
for(int j = 0; j < i; j++)
{
if(strIn[j] == strIn[i])
{
foundDup = true;
numDups++;
break;
}
}
if(foundDup == false)
{
strIn[prevIndex] = strIn[i];
prevIndex++;
}
}
strIn[len-numDups] = '\0';
}
}
The hash/linq approaches above are what you would generally use in real life. However in interviews they usually want to put some constraints e.g. constant space which rules out hash or no internal api - which rules out using LINQ.
The following piece of code attempts to remove duplicates from an ArrayList though this is not an optimal solution. I was asked this question during an interview to remove duplicates through recursion, and without using a second/temp arraylist:
private void RemoveDuplicate() {
ArrayList dataArray = new ArrayList(5);
dataArray.Add("1");
dataArray.Add("1");
dataArray.Add("6");
dataArray.Add("6");
dataArray.Add("6");
dataArray.Add("3");
dataArray.Add("6");
dataArray.Add("4");
dataArray.Add("5");
dataArray.Add("4");
dataArray.Add("1");
dataArray.Sort();
GetDistinctArrayList(dataArray, 0);
}
private void GetDistinctArrayList(ArrayList arr, int idx)
{
int count = 0;
if (idx >= arr.Count) return;
string val = arr[idx].ToString();
foreach (String s in arr)
{
if (s.Equals(arr[idx]))
{
count++;
}
}
if (count > 1)
{
arr.Remove(val);
GetDistinctArrayList(arr, idx);
}
else
{
idx += 1;
GetDistinctArrayList(arr, idx);
}
}