views:

106

answers:

6

Hi. I have a bunch of names in alphabetical order with multiple instances of the same name all in alphabetical order so that the names are all grouped together. Beside each name, after a coma, I have a role that has been assigned to them, one name-role pair per line, something like whats shown below

name1,role1
name1,role2
name1,role3
name1,role8
name2,role8
name2,role2
name2,role4
name3,role1
name4,role5
name4,role1
...
..
.

I am looking for an algorithm to take the above .csv file as input create an output .csv file in the following format

name1,role1,role2,role3,role8
name2,role8,role2,role4
name3,role1
name4,role5,role1
...
..
.

So basically I want each name to appear only once and then the roles to be printed in csv format next to the names for all names and roles in the input file.

The algorithm should be language independent. I would appreciate it if it does NOT use OOP principles :-) I am a newbie.

+4  A: 

Obviously has some formatting bugs but this will get you started.

var lastName = "";

do{
  var name = readName();
  var role = readRole();
  if(lastName!=name){
    print("\n"+name+",");
    lastName = name;
  }
  print(role+",");
}while(reader.isReady());
Stefan Mai
Thanks Stefan, appreciate your solution, but when will lastName get updated in your algorithm?
QuickMist
I patched up Stefan's code (hope you don't mind) to add the place where to refresh `lastName` content.
Matthieu M.
@Matthieu - Thanks for clarifying.
QuickMist
@Matthieu Thanks! I had someone call my attention away and didn't look it over.
Stefan Mai
A: 

This is easy to do if your language has associative arrays: arrays that can be indexed by anything (such as a string) rather than just numbers. Some languages call them "hashes," "maps," or "dictionaries."

On the other hand, if you can guarantee that the names are grouped together as in your sample data, Stefan's solution works quite well.

Barry Brown
A: 

It's kind of a pity you said it had to be language-agnostic because Python is rather well-qualified for this:

import itertools
def split(s):
  return s.strip().split(',', 1)
with open(filename, 'r') as f:
  for name, lines in itertools.groupby(f, lambda s: split(s)[0])
    print name + ',' + ','.join(split(s)[1] for s in lines)

Basically the groupby call takes all consecutive lines with the same name and groups them together.

Now that I think about it, though, Stefan's answer is probably more efficient.

David Zaslavsky
Very cryptic. I can't understand it very well. What is lamda s:?
QuickMist
`lambda s: split(s)[0]` is a function that calls `split(s)` and then returns the first element of the returned tuple - which in this case is the name, the first thing on the line. `itertools.groupby` uses that function to group the lines by name. Then `lambda s: split(s)[1]` returns the second thing on the line, the role, and that line joins all the roles together with commas. (I really just posted it as a curiosity, it's not the answer you were looking for.)
David Zaslavsky
Didn't know about groupby, thanks David!
Stefan Mai
A: 

Here is a solution in Java:

Scanner sc = new Scanner (new File(fileName));
Map<String, List<String>> nameRoles = new HashMap<String, List<String>> ();
while (sc.hasNextLine()) {
  String line = sc.nextLine();
  String args[] = line.split (",");
  if (nameRoles.containsKey(args[0]) {
    nameRoles.get(args[0]).add(args[1]);
  } else {
    List<String> roles = new ArrayList<String>();
    roles.add(args[1]);
    nameRoles.put(args[0], roles);
  }
}

// then print it out
for (String name : nameRoles.keySet()) {
   List<String> roles = nameRoles.get(name);
   System.out.print(name + ",");
   for (String role : roles) {
     System.out.print(role + ",");
   }
   System.out.println();
}

With this approach, you can work with an random input like:

name1,role1

name3,role1

name2,role8

name1,role2

name2,role2

name4,role5

name4,role1

vodkhang
Thanks for the solution, but I can't understand what Map<String, List<String>> nameRoles = new HashMap<String, List<String>> (); means.
QuickMist
Also, if you could tell me the logic that you followed it would be very helpful.
QuickMist
They are some specific implementation of java but you can found that in almost all other programming languages. The data structure is like this: imaging that you have a table with 2 columns, where one column contains a unique key, and an array of data in the other column. Everytime you have a new data like name1 - role2 . You find the name1 in the column key (by hashing, looping ...), and then you get out the array of data corresponding in the other column. Then you add role2 into that array of data
vodkhang
A: 

Here it is in C# using nothing fancy. It should be self-explanatory:

static void Main(string[] args) 
{
    using (StreamReader file = new StreamReader("input.txt"))
    {
        string prevName = "";
        while (!file.EndOfStream)
        {
            string line = file.ReadLine(); // read a line

            string[] tokens = line.Split(','); // split the name and the parameter

            string name = tokens[0]; // this is the name
            string param = tokens[1]; // this is the parameter

            if (name == prevName) // if the name is the same as the previous name we read, we add the current param to that name. This works right because the names are sorted.
            {
                Console.Write(param + " ");
            }
            else // otherwise, we are definitely done with the previous name, and have printed all of its parameters (due to the sorting).
            {
                if (prevName != "") // make sure we don't print an extra newline the first time around
                {
                    Console.WriteLine();
                }
                Console.Write(name + ": " + param + " "); // write the name followed by the first parameter. The output format can easily be tweaked to print commas.
                prevName = name; // store the new name as the previous name.
            }
        }
    }
}
IVlad
Thanks for this solution. This one actually helped me understand the algorithm that I should use, which was what I wanted from the beginning. Its pretty easy to understand and self-explanatory.
QuickMist
A: 

I finally got my program working using C. Can you suggest any improvements to the writing style or programming style that was used in my program?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    FILE *in, *out;
    char lineIn[80], nameArr1[30], nameArr2[30], roleArr[50], lineOut[1024];
    char *namePtr, *rolePtr;
    int i, linesRead, linesWrote;

    //Open our input file in read mode
    if((in=fopen("UsersAndRoles.csv","r")) == NULL)
    {
        printf("Error opening input file...Exiting!\n");
        exit(EXIT_FAILURE);
    }
    //Open our output file in write mode
    if((out=fopen("MergedUsersAndRoles.csv","w")) == NULL)
    {
        printf("Error creating output file...Exiting!\n");
        exit(EXIT_FAILURE);
    }
    printf("Input and output files are open...\n");
    //Initialise our variables
    memset(lineIn,0,80);
    memset(nameArr1,0,30);
    memset(nameArr2,0,30);
    memset(roleArr,0,50);
    memset(lineOut,0,1024);
    namePtr=NULL;
    rolePtr=NULL;
    i=0;
    linesRead=0;
    linesWrote=0;

    while(!feof(in))
    {
        //Read a line from the input file
        fgets(lineIn,80,in);
        //Tokenise the input line into its constituents
        namePtr=strtok(lineIn,",");
        rolePtr=strtok(NULL,"\r");
        if (namePtr !=  NULL && rolePtr != NULL)
        {
            //Assign the name to the array nameArr1
            for(i=0;*namePtr != '\0';i++)
            {
                nameArr1[i]=*namePtr;
                namePtr++;
            }   //end for
            nameArr1[i]='\0';
            //Assign the role to the array roleArr
            for(i=0;*rolePtr != '\0';i++)
            {
                roleArr[i]=*rolePtr;
                rolePtr++;
            }   //end for
            roleArr[i]='\0';
            //If names are the same, add the role to current line
            if(strcmp(nameArr1,nameArr2) == 0)
            {
                strcat(lineOut,",");
                strcat(lineOut,roleArr);
            }   //end if
            //If names are not same print our line and start the new line
            else
            {
                memset(lineOut+strlen(lineOut)+1,'\0',1);
                //Checking to ensure that we are compared 2 names and not NULLS
                if (linesRead >= 2)
                {
                    //Write out to the file
                    fputs(lineOut,out);
                    fputs("\n",out);
                    linesWrote++;
                }
                memset(lineOut,0,1024);
                strcat(lineOut,nameArr1);
                strcat(lineOut,",");
                strcat(lineOut,roleArr);
                memset(nameArr2,0,30);
                memcpy(nameArr2, nameArr1, strlen(nameArr1));
            }   //end else
        }   //end if
        //Increment the line counter
        linesRead++;
    }   //end while
    //Write out the last line to the file
    memset(lineOut+strlen(lineOut)+1,'\0',1);
    fputs(lineOut,out);
    fputs("\n",out);
    linesWrote++;
    //Close input and output files
    fclose(in);
    fclose(out);
    //Print some statistics
    printf("\nTotal number of lines read: %d",--linesRead);
    printf("\nTotal number of lines wrote: %d\n",linesWrote);
    printf("\nInput and output files are closed...\n");

    return (EXIT_SUCCESS);
}
QuickMist
This belongs as a separate question, not an answer (and actually, a general "can you suggest any improvements" question isn't really the kind of thing StackOverflow is meant for)
David Zaslavsky