views:

5823

answers:

7

Hello everyone.

So I'm trying to migrate a MySQL-based app over to MS Sql Server 2005 (not by choice, but that's life).

In the original app, we used almost entirely ANSI-SQL compliant statements, with one significant exception -- we used MySQL's group_concat function fairly frequently.

group_concat, by the way, does this: given a table of, say, employee names and projects...

SELECT empName, projID FROM project_members;

returns:

ANDY   |  A100
ANDY   |  B391
ANDY   |  X010
TOM    |  A100
TOM    |  A510

... and here's what you get with group_concat:

SELECT empName, group_concat(projID SEPARATOR ' / ') 
FROM project_members GROUP BY empName;

returns:

ANDY   |  A100 / B391 / X010
TOM    |  A100 / A510

...So what I'd like to know is: Is it possible to write, say, a user-defined function in MS SQL which emulates the functionality of group_concat? I have almost no experience using UDFs, stored procedures, or anything like that -- just straight-up SQL -- so please err on the side of too much explanation :)

+6  A: 

No REAL easy way to do this. Lots of ideas out there, though.

Best one I've found:

SELECT table_name, LEFT(column_names , LEN(column_names )-1) AS column_names
FROM information_schema.columns AS extern
CROSS APPLY
(
    SELECT column_name + ','
    FROM information_schema.columns AS intern
    WHERE extern.table_name = intern.table_name
    FOR XML PATH('')
) pre_trimmed (column_names)
GROUP BY table_name, column_names;
BradC
This example worked for me, but I tried doing another aggregation and it didn't work, gave me an error: "the correlation name 'pre_trimmed' is specified multiple times in a FROM clause."
ZaijiaN
+3  A: 

There is no easy way to do this with just T-SQL in SQL Server.

My recommendation is to create a CLR aggregate (which requires programming in .NET) which will take the values and concatenate them into a single value, and then use that aggregate in a SQL Statement with a simple GROUP BY.

For more information on CLR aggregates, see here:

http://msdn.microsoft.com/en-us/library/91e6taax(VS.80).aspx

casperOne
This does sound like the best way to do it - I have no .NET programming experience, though... is this something that'd be pretty easy to learn enough to figure it out in a hurry? Or (as I'm on a short timeline here) would I be better off finding a good .NET programmer?
DanM
@DanM: For something like this, it's not that hard at all. I wouldn't go and hire a separate .NET programmer for just this.
casperOne
+1  A: 

With the below code you have to set PermissionLevel=External on your project properties before you deploy, and change the database to trust external code (be sure to read elsewhere about security risks and alternatives [like certificates]) by running "ALTER DATABASE database_name SET TRUSTWORTHY ON".

using System;
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using Microsoft.SqlServer.Server;

[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,
MaxByteSize=8000,
IsInvariantToDuplicates=true,
IsInvariantToNulls=true,
IsInvariantToOrder=true,
IsNullIfEmpty=true)]
    public struct CommaDelimit : IBinarySerialize
{


[Serializable]
 private class StringList : List<string>
 { }

 private StringList List;

 public void Init()
 {
  this.List = new StringList();
 }

 public void Accumulate(SqlString value)
 {
  if (!value.IsNull)
   this.Add(value.Value);
 }

 private void Add(string value)
 {
  if (!this.List.Contains(value))
   this.List.Add(value);
 }

 public void Merge(CommaDelimit group)
 {
  foreach (string s in group.List)
  {
   this.Add(s);
  }
 }

 void IBinarySerialize.Read(BinaryReader reader)
 {
    IFormatter formatter = new BinaryFormatter();
    this.List = (StringList)formatter.Deserialize(reader.BaseStream);
 }

 public SqlString Terminate()
 {
  if (this.List.Count == 0)
   return SqlString.Null;

  const string Separator = ", ";

  this.List.Sort();

  return new SqlString(String.Join(Separator, this.List.ToArray()));
 }

 void IBinarySerialize.Write(BinaryWriter writer)
 {
  IFormatter formatter = new BinaryFormatter();
  formatter.Serialize(writer.BaseStream, this.List);
 }
    }

I've tested this using a query that looks like:

SELECT 
 dbo.CommaDelimit(X.value) [delimited] 
FROM 
 (
  SELECT 'D' [value] 
  UNION ALL SELECT 'B' [value] 
  UNION ALL SELECT 'B' [value] -- intentional duplicate
  UNION ALL SELECT 'A' [value] 
  UNION ALL SELECT 'C' [value] 
 ) X 

And yields: A, B, C, D

GregTSmith
+6  A: 

Possibly too late to be of benefit now, but is this not the easiest way to do things?

SELECT     empName, projIDs = replace
                          ((SELECT Surname AS [data()]
                              FROM project_members
                              WHERE  empName = a.empName
                              ORDER BY empName FOR xml path('')), ' ', REQUIRED SEPERATOR)
FROM         project_members a
WHERE     empName IS NOT NULL
GROUP BY empName
J Hardiman
Interesting. I've already finished the project at hand, but I'll give this method a try. Thanks!
DanM
Nice trick -- only problem is for surnames with spaces it will replace the space with the separator.
Mark E
I have encountered such a problem myself, Mark. Unfortunately, until MSSQL gets with the times and introduces GROUP_CONCAT, this is the least of the overhead-intensive methods I have been able to come up with for what is needed here.
J Hardiman
A: 

About J Hardiman's answer, how about:

SELECT empName, projIDs=
  REPLACE(
    REPLACE(
      (SELECT REPLACE(projID, ' ', '-somebody-puts-microsoft-out-of-his-misery-please-') AS [data()] FROM project_members WHERE empName=a.empName FOR XML PATH('')), 
      ' ', 
      ' / '), 
    '-somebody-puts-microsoft-out-of-his-misery-please-',
    ' ') 
  FROM project_members a WHERE empName IS NOT NULL GROUP BY empName

By the way, is the use of "Surname" a typo or am i not understanding a concept here?

Anyway, thanks a lot guys cuz it saved me quite some time :)

A: 

This is very cool, but I don't understand it well enough to apply it to my problem.

I want to assign a value to an empty '' field that will be the same for all records that have a number of other fields the same.

i.e. manpn = somethingthesame if manname is the same suffix is the same etc...

I am assuming you could make the thing that is the same, just a cocatenation of a the one field that differs modelno

This is to get rid of the blanks intelligently

Then I have a further problem, I want to take all the records with the same manufacturer, suffix and manpn and get rid of the duplicates while building a list of modelno separated by commas.

So in the end there will be only one record that has the same manname, manpn, suffix but that indicates all the compatible models.

Any chance someone can show me how step 1 and step 2 could be accomlished with this technique? After the modelno field has been fixed, I could then use a standard technique to remove the duplicates.

andrew
A: 

Tried these but for my purposes in MS SQL Server 2005 the following was most useful, which I found at xaprb [link below]:

declare @result varchar(8000);

set @result = '';

select @result = @result + name + ' '

from master.dbo.systypes;

select rtrim(@result);

@Mark as you mentioned it was the space character that caused issues for me.

[http://www.xaprb.com/blog/2005/09/28/simulating-the-group-concat-function/][1]

isoughtajam