tags:

views:

217

answers:

6

I just can't seem to wrap my head around this... I have a sequence that look a bit like this:

A  1  1  50
A  1  2  50
A  2  1  20
A  2  2  60
B  1  1  35
B  1  2  35
C  1  1  80
D  1  1  12
D  1  2  12
D  1  3  15
D  2  1  12

What I need to do is to set those last column values to 0, where they are not the last value. So for example I need to set A11 to 0 because there is an A12. And I need to set A21 to 0, because there is an A22. B11 must be 0, because there is a B12. C11 is left, since there is no C12. D11 and D12 goes 0 because of D13, and D21 is left like it is.

In the actual dataset, column 1 is usernames, column 2 is dates, column 3 is login times and the last is an amount. And I need to set all those amounts, that are not connected with the last login time on a certain date for a certain user to 0.

Anyone that is able to do this in an easy way? I'm thinking multiple nested foreach statements or joining and grouping etc etc, but I can't decide how to do it best... or how to do it at all really...

(And please do edit my title if you come up with a better one!)


More info

Sorry, I should of course have told you more about the language and such. I am using C# and .NET 3.5. And I am using Linq to SQL to fetch the data from the database, and I have all that data as objects (Of a simple kind of container type class) in a generic List. I wish I could have just adjusted the query to only get the last rows, but the problem is that this data is going into a report, and all of the rows has to be displayed. I was hoping that I could do this in the report, but this seems to be not so easy. Or at least I have no clue how to do it, and haven't gotten any working answers either. So this is why I figure I need to copy that column and clear out those duplicate values. So that I can display the column with all values, and then do the summing on the column that only has the last values. Hope that made sense :P

Oh, and also, to clear up the dates and login times in separate columns issue: Thing is that they are both DateTimes, and the date doesn't need to have the same date as the login datetime. It is a weird issue caused by that in what I am working with a day may not need to be 00:00 to 23:59. A day may start at 3:00 in the morning for example.


Update

Just figured out that this whole issue might be solved in a different way... (And probably even should be) By creating a new column (In my case a new property) and then copy only those last values over into that new property. But again I would have to find all those values... So kind of same problem as the one I already have but a bit backwards or what to call it.

A: 

If you're using .NET 3.5 then LINQ might give you the tools to construct a sensible query that is less painful to fathom than a bunch of nested foreach loops.

Assuming your data is in a database you can use LINQ to SQL or LINQ to Entities to construct strongly-typed objects to represent records in your database table.

Does this class accurately represent the data you are working with?

class User
{
    public string Username { get; set; }
    public DateTime LastLoginDateTime { get; set; }
    public int LoginCount { get; set; }
}
Richard Ev
I am using Linq to SQL yes. The resulting data I am getting from the database doesn't look exactly like that. It has username, date and loginDateTime (The date is not necessarily the same as the loginDateTime. It is a weird thing about what I am currently working on... Annoying? yes). The problem is also that I need all of the rows to be there. I can't just, not get those rows. I'll do some more explaining in the question.
Svish
A: 

Not the cleanest code in the world, but this will get it done. You should factor out a method or two, I leave that to you.

    static void ClearRepeatValues()
    {
        var arr = new[] {
        new [] {"A","1","1","50",},
        new [] {"A","1","2","50",},
        new [] {"A","2","1","20",},
        new [] {"A","2","2","60",},
        new [] {"B","1","1","35",},
        new [] {"B","1","2","35",},
        new [] {"C","1","1","80",},
        new [] {"D","1","1","12",},
        new [] {"D","1","2","12",},
        new [] {"D","1","3","15",},
        new [] {"D","2","1","12"}
        };

        if (arr == null || arr.Length == 0)
        {
            return;
        }
        var lastRow = arr[0];
        for (int i = 1; i < arr.Length; i++)
        {
            var currentRow = arr[i];
            if (lastRow[0] == currentRow[0] && lastRow[1] == currentRow[1])
            {
                lastRow[3] = "0";
            }
            lastRow = currentRow;
        }
    }

However, a simple SQL query might be a better way to get only the values that you care about, EG:

select * from Session s1 where s1.Id in 
(select top 1 s2.Id from Session s2 where s2.User = s1.User order by s2.Date)
RossFabricant
+1  A: 

If the data is in ascending user/date/time order, how about simply:

    Foo lastRow = null;
    foreach (Foo row in list)
    {
        if (lastRow != null && row.User == lastRow.User
            && row.Date == lastRow.Date)
        {
            lastRow.Value = 0;
        }
        lastRow = row;
    }
Marc Gravell
Interesting... That might work... Although I would have to sort it first then, which could be done of course.
Svish
+1  A: 

Assuming the DataTable is not sorted, you could copy the rows into an array and then use the Array.Sort function to order it so that it groups together all user logins for a day in the correct order. Then just pass over the rows and set the value to 0 excpet for the last instance.

For example:

    private void Filter(DataTable tbl)
    {
        DataRow[] rows = new DataRow[tbl.Rows.Count];
        rows.CopyTo(rows, 0);
        Array.Sort<DataRow>(rows, FilterOrder);

        for (int i = 0; i < rows.Length - 1; i++)
        {
            if ((string)rows[i][0] != (string)rows[i + 1][0])
                continue;
            if ((DateTime)rows[i][1] != (DateTime)rows[i + 1][1])
                continue;
            rows[i][3] = 0;
        }
    }

    private int FilterOrder(DataRow row1, DataRow row2)
    {
        string r1c1 = (string)row1[0];
        string r2c1 = (string)row2[0];
        if (r1c1 != r2c1) return r1c1.CompareTo(r2c1);

        DateTime r1c2 = (DateTime)row1[1];
        DateTime r2c2 = (DateTime)row2[1];
        if (r1c2 != r2c2) return r1c2.CompareTo(r2c2);

        DateTime r1c3 = (DateTime)row1[2];
        DateTime r2c3 = (DateTime)row2[2];
        return r1c3.CompareTo(r2c3);
    }
JDunkerley
A: 

Here's my SQL Server compatible solution.

You can find the table layout below.

The solution

My idea is to find out the colums you're interested in, join that result set with the table and modify every other row in the table:

with SequencesToKeep(col1, col2, col3Max) as
(
    select 
     col1, 
     col2, 
     max(col3) col3Max 
    from 
     sequences
    group by 
     col1, 
     col2
)
update
    sequences
set
    col4 = 0
from
    sequences s
    left join SequencesToKeep sk 
     on  s.col1 = sk.col1 
     and s.col2 = sk.col2
     and s.col3 = sk.col3Max
where
    sk.col1 is null

The result

select * from sequences

col1 col2        col3        col4
---- ----------- ----------- -----------
A    1           1           0
A    1           2           50
A    2           1           0
A    2           2           60
B    1           1           0
B    1           2           35
C    1           1           80
D    1           1           0
D    1           2           0
D    1           3           15
D    2           1           12

Table layout and demo data

create table sequences (col1 varchar(1), col2 int, col3 int, col4 int)
go
insert into sequences values ('A', 1, 1, 50)
insert into sequences values ('A', 1, 2, 50)
insert into sequences values ('A', 2, 1, 20)
insert into sequences values ('A', 2, 2, 60)
insert into sequences values ('B', 1, 1, 35)
insert into sequences values ('B', 1, 2, 35)
insert into sequences values ('C', 1, 1, 80)
insert into sequences values ('D', 1, 1, 12)
insert into sequences values ('D', 1, 2, 12)
insert into sequences values ('D', 1, 3, 15)
insert into sequences values ('D', 2, 1, 12)
go
VVS
This would set those columns in the database to 0, wouldn't it? (Not too stable with SQL...) If so, then this won't work, cause I can't throw the values away. And I even have to have them in the result set because they are going to be in a report.
Svish
So your changes are only temporary. You could have mentioned that. I don't understand your objection about stability of SQL at all.
VVS
A: 

Ok, I decided to do it the other way around. I added another column, and then I run this code:

data
    .GroupBy(x => new
        {
            x.Col1,
            x.Col2,
        })
    .Select(x => x.MaxBy(y => y.Col3)
    .ForEach(x =>
        {
            x.Col5 = x.Col4,
        });

This solution does not really answer my original question though. But, to give a suggestion on how one could do that, based on the way I did it:

  • Group like in other example
  • For each group sequence ordered in descending order based on Col3
    • Set all but the first item to 0 (Maybe just use a bool or something)

Untested, but in my head this should work at least :p

Note: The MaxBy and ForEach methods are from MoreLinq.

Svish