tags:

views:

95

answers:

1

I am using Linq to select and process lines from a text file. My txtfile is two columns delimitted by the pipe character "|". The File contains the following:

HAbbe|11
GABBOT|22
DABDA|33
RAchant|44
RADA|55
DABDA|66

You will notice that line 3 and line 6 have a duplicated ID(Column 1). I want to use linq to initially read the posted txt file find the duplicate (and report on it) and then I would like to select from ling query only the lines that are not duplicated. The following is what I have :

 StreamReader srReader = new StreamReader(fUpload.PostedFile.InputStream);

                var query1 =
                       from line in srReader.Lines()
                       let items = line.Split('|')
                       select new UploadVars()
                       {
                           ID = items[0],
                           Number = items[1]
                       };
                var GroupedQuery = from line in query1
                                   group line by line.ID into grouped
                                   where grouped.Count() > 1
                                   select new {
                                       ID = grouped.Key,
                                       MCount = grouped.Count()
                                   };

                StringBuilder sb = new StringBuilder();
                foreach (var item in GroupedQuery)
                {

                    sb.AppendFormat("The following external ID's occur more than once and have not been processed:<br> {0}. Duplicated {1} times.", item.ID, item.MCount);
                }

This is all ok and giving me the correct results. I am now looking to select all the lines except the 2 duplicated lines from the text file. I have composed the following linq statement but for some reason I am having no luck:

//lets start at the beginnnig of the the posted filestream 
                fUpload.PostedFile.InputStream.Position = 0;
                srReader = new StreamReader(fUpload.PostedFile.InputStream);
                var query2 = from line in srReader.Lines()
                             let items = line.Split('|')
                             select new UploadVars()
                             {
                                 ID = items[0],
                                 Number = items[1]
                             };                                   

                var qryNoDupedMems = from Memb in query2
                                      where !(from duped in GroupedQuery
                                              select duped.ID)
                                              .Contains(Memb.ID)
                                      select Memb; 

The result of qryNoDupedMems is the complete list from the text file. Could someone explain what I'm doing wrong here... Thanks in Advance

+2  A: 

In a group query, the grouped variable is also an IEnumerable containing the tems in the group.

Therefore, you can write the following:

var nonDuplicates = from line in query1
    group line by line.ID into grouped
    where grouped.Count() == 1
    select grouped.First()
SLaks
Thanks for your prompt help with this. I am able to retieve the desired result. You don't perhaps know why the Not IN (!) operator was not returning the correct result?
Hawkesy