views:

97

answers:

2
+8  Q: 

Combining SQL Rows

I've got SQL Compact Database that contains a table of IP Packet Headers. The Table looks like this:

Table: PacketHeaders    

ID  SrcAddress  SrcPort  DestAddress  DestPort  Bytes
1   10.0.25.1   255      10.0.25.50   500       64
2   10.0.25.50  500      10.0.25.1    255       80
3   10.0.25.50  500      10.0.25.1    255       16
4   75.48.0.25  387      74.26.9.40   198       72
5   74.26.9.40  198      75.48.0.25   387       64
6   10.0.25.1   255      10.0.25.50   500       48

I need to perform a query to show 'conversations' going on across a local network. Packets going from A -> B is part of the same conversations as packets going from B -> A. I need to perform a query to show the on going conversations. Basically what I need is something that looks like this:

Returned Query:

SrcAddress  SrcPort  DestAddress  DestPort  TotalBytes  BytesA->B  BytesB->A
10.0.25.1   255      10.0.25.50   500       208         112        96
75.48.0.25  387      74.26.9.40   198       136         72         64

As you can see I need the query (or series of queries) to recognize that A->B is the same as B->A and break up the byte counts accordingly. I'm not a SQL guru by any means but any help on this would be greatly appreciated.

+2  A: 

I can see two basic ways of doing this...
1. Group it all up, ignoring the a->b and b->a and then self join the results.
2. Rearrange your data with the "lowest" ip address in the 'src' field, but also create a 'direction' field.

Option 2 is probably the way I'd go...

SELECT
    SrcAddress,
    SrcPort,
    DestAddress,
    DestPort,
    SUM(AtoB) + SUM(BtoA),
    SUM(AtoB),
    SUM(BtoA)
FROM
(
    SELECT
       CASE WHEN SrcAddress < DestAddress THEN SrcAddress  ELSE DestAddress END AS SrcAddress,
       CASE WHEN SrcAddress < DestAddress THEN SrcPort     ELSE DestPort    END AS SrcPort,
       CASE WHEN SrcAddress < DestAddress THEN DestAddress ELSE SrcAddress  END AS DestAddress,
       CASE WHEN SrcAddress < DestAddress THEN DestPort    ELSE ScrPort     END AS DestPort,
       CASE WHEN SrcAddress < DestAddress THEN Bytes       ELSE 0           END AS AtoB,
       CASE WHEN SrcAddress < DestAddress THEN 0           ELSE Bytes       END AS BtoA
    FROM
      PacketHeaders
)
    AS [data]
GROUP BY
    SrcAddress,
    SrcPort,
    DestAddress,
    DestPort

EDIT

A couple of other answers have version of what I called option 1. I'll have a go at it too rather than spamming comments on people's answers :(

SELECT
   ISNULL([AtoB].SrcAddress,  [BtoA].DestAddress)
   ISNULL([AtoB].SrcPort,     [BtoA].DestPort)
   ISNULL([AtoB].DestAddress, [BtoA].SrcAddress)
   ISNULL([AtoB].DestPort,    [BtoA].SrcPort)
   ISNULL([AtoB].Bytes,0) + ISNULL([BtoA].Bytes,0),
   ISNULL([AtoB].Bytes,0),
   ISNULL([BtoA].Bytes,0)
FROM
   (
      SELECT   SrcAddress, SrcPort, DestAddress, DestPort, SUM(Bytes) AS Bytes
      FROM     PacketHeaders
      WHERE    SrcAddress <= DestAddress
      GROUP BY SrcAddress, SrcPort, DestAddress, DestPort
   )
   AS [AtoB]
FULL OUTER JOIN
   (
      SELECT   SrcAddress, SrcPort, DestAddress, DestPort, SUM(Bytes) AS Bytes
      FROM     PacketHeaders
      WHERE    SrcAddress > DestAddress
      GROUP BY SrcAddress, SrcPort, DestAddress, DestPort
   )
   AS [BtoA]
      ON  [AtoB].SrcAddress  = [BtoA].DestPort
      AND [AtoB].SrcPort     = [BtoA].DestAddress
      AND [AtoB].DestAddress = [BtoA].SrcPort
      AND [AtoB].DestPort    = [BtoA].SrcAddress

But I did say I wouldn't do it that way...

Dems
If I want to see a Total Bytes Column I have to encapsulate this all within another SELECT statement. Is there a cleaner way?
lumberjack4
I added the line "SUM(AtoB) + SUM(BtoA)," for you...
Dems
Are you sure that SQL Server CTE supports IsNull? I found this post: http://sqlserverce.org/blogs/faq/archive/2007/02/16/isnull-v-s-coalesce.aspx
Mark Byers
In that case you can (as you're already pointed out) use COALESCE instead. Still, however, I would use the option of "reorder/clean the data" THEN "agregate the clean data". I wouldn't try the JOIN method, there are many more corner cases to avoid, it's harder to read, and is quite possibly slower... (The 'cleaning' that I do with CASE statements can be doen with UNION ALL and WHERE clauses instead, which may be faster)
Dems
+2  A: 

Try this:

SELECT
    T1.SrcAddress,
    T1.SrcPort,
    T1.DestAddress,
    T1.DestPort,
    T1.Bytes + COALESCE(T2.Bytes, 0) AS TotalBytes,
    T1.Bytes AS A_to_B,
    COALESCE(T2.Bytes, 0) AS B_to_A
FROM (
    SELECT SrcAddress, SrcPort, DestAddress, DestPort, SUM(Bytes) AS Bytes
    FROM PacketHeaders
    GROUP BY SrcAddress, SrcPort, DestAddress, DestPort) AS T1
LEFT JOIN (
    SELECT SrcAddress, SrcPort, DestAddress, DestPort, SUM(Bytes) AS Bytes
    FROM PacketHeaders
    GROUP BY SrcAddress, SrcPort, DestAddress, DestPort) AS T2
ON T1.SrcAddress = T2.DestAddress
AND T1.SrcPort = T2.DestPort
AND T1.DestAddress = T2.SrcAddress
AND T1.DestPort = T2.SrcPort
WHERE T1.SrcAddress < T1.DestAddress OR
    (T1.SrcAddress = T1.DestAddress AND T1.SrcPort = T1.DestPort) OR
    T2.DestAddress IS NULL

On this test data:

CREATE TABLE PacketHeaders (ID INT, SrcAddress NVARCHAR(100), SrcPort INT, DestAddress NVARCHAR(100), DestPort INT, Bytes INT);
INSERT INTO PacketHeaders (ID, SrcAddress, SrcPort, DestAddress, DestPort, Bytes) VALUES
(1, '10.0.25.1', 255, '10.0.25.50', 500, 64),
(2, '10.0.25.50', 500, '10.0.25.1', 255, 80),
(3, '10.0.25.50', 500, '10.0.25.1', 255, 16),
(4, '75.48.0.25', 387, '74.26.9.40', 198, 72),
(5, '74.26.9.40', 198, '75.48.0.25', 387, 64),
(6, '10.0.25.1', 255, '10.0.25.50', 500, 48),
(7, '10.0.25.2', 255, '10.0.25.50', 500, 48),
(8, '10.0.25.52', 255, '10.0.25.50', 500, 48);

This gives the following results:

'10.0.25.1', 255, '10.0.25.50', 500, 208, 112, 96
'10.0.25.2', 255, '10.0.25.50', 500, 48, 48, 0
'10.0.25.52', 255, '10.0.25.50', 500, 48, 48, 0
'74.26.9.40', 198, '75.48.0.25', 387, 136, 64, 72

The way it works is to first group one-way conversations and total the byte counts. This assures that every conversation will be represented exactly twice - once for each direction. This result is then self-joined to give the result you need, filtering the duplicates by enforcing that the (address, port) of A must be less than B. A left join is used to allow one-way conversations.

Mark Byers
Assumes each path has a matching opposite entry. So it all needs to be an OUTER JOIN, which then means you'd need to GROUP BY ISNULL(T1.SrcAddress, T2.DestAddress), etc.
Dems
An edit removed the outer GROUP BY, etc. This still only works if every packet corresponds to a packet going in the opposite direction. (But if that condition is true, it works, I think...)
Dems
This only works for 2-way conversations. Not all conversations will be 2-way. How can I modify the join to mitigate this?
lumberjack4
@lumberjack: You could use LEFT JOIN instead of JOIN, but you'll also have to add a null check on the SUM.
Mark Byers
Stupid question. Can I do a null check within the query and set the value to 0 or will do I have to do it after I get the result back?
lumberjack4
In the WHERE clause you have the condition [T1.SrcAddress < T1.DestAddress], but then also factor in the Ports in the OR portion. What if it's [192.0.0.1:999] to [192.0.0.1:666]? To check for "self talk", you'd need to do something like a LEFT OUTER JOIN and check that T1.MinID != T2.MinID in it's predicate.
Dems
@Dems: you're right... this doesn't quite work for one-way conversations - I thought it would be a simple fix, but it's a little more involved. I'll try to fix it.
Mark Byers
With conversions that go in One direction only, you'd need a FULL OUTER JOIN; Because you enforce "low ip" to "high ip" for [t1] in the WHERE clause.
Dems
I think it works for one-way conversations now. Test data attached.
Mark Byers
@lumberjack: You can check for null is SQL Server CE using the function COALESCE. I've included this null check in my updated query.
Mark Byers
What if the one way conversation is [55.55.55.55] to [11.11.11.11]? Your WHERE clause filters it out. I have to leave now, but I've written my version of this in my answer...
Dems
Thanks Dems and Mark. You guys are awesome!
lumberjack4
The chosen answer as written right now doesn't work :) Night night...
Dems
Works for me on SQLCE.
lumberjack4
@lumberjack: see my previous comment. If you have traffic from [55.55.55.55] to [11.11.11.11], but not in the opposite direction, it won't be inculded in your final results.
Dems
@Dems: Are you sure? Have you tested it? I haven't tested on that specific example, but I've tested similar examples and it seems to work for those.
Mark Byers
There are two factors. Firstly, the WHERE clause says T1.SrcAddress < T1.DestAddress, this forces all T1 records to be LowIPAddress -> HighIPAddress. Secondly, the LEFT JOIN means that T2 will only ever have an entry if it has a corresponding T1 records. As my example is High->Low instead of Low->High, it doesn't appear in T1. Equally, it can't appear in T2 as it is a 'one way' conversation and has no corresponding T1 record to join against. Thus, High->Low one way conversations will not be included in your results.
Dems
@Dems: I think I have that case covered because the WHERE clause has an extra `OR T2.DestAddress IS NULL` which will allow it to match all one-way converstaions even if the row is High->Low. I included the row `(8, '10.0.25.52', 255, '10.0.25.50', 500, 48);` in my test bed specifically to test this behaviour, and as you can see from the results, it seems to work as expected. Is your example any different from this?
Mark Byers
OMG. Ich Bin Ein Berlinner. I owe you an apology. +1, but if I could give you +2 I would. My copy of your code didn't have the IS NULL condition...
Dems
@Dems: No worries! :)
Mark Byers