views:

2543

answers:

3

I am importing csv to db using bulk insert. It is the comma delimited csv file. No text qualifiers for all fields.

But some fields may have comma as part of the data. for eg, ADDRESS field value. Those values are surronded with double quotes. Those double quotes appear only if the field value has comma in it otherwise values are not surronded with double quotes. So in some rows ADDRESS values are surronded with double-quotes, but in other rows they are not. Is there a way to specify the text-qualifier in the bulk insert command?

I tried bulk insert with format file option.

BULK INSERT Test_Imported FROM 'C:\test.csv' 
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')

but there is no way i can mention the double quotes as optional text qualifiers in the format file.

PS: this function is actually a part of the bigger module, which is written in c#. bulk insert command is called from c#.

The csv file is coming by email from another automated system. i have no control over the format of the csv file.There are around 150 columns. In average 12000 rows are coming in each csv file. Forgot to spcify the DB. It is SQL server 2005.

+1  A: 

Unfortunately, you'll have to pre-process the file to make it consistent. SQL bulk operations split the string on the field delimiter.

Some options:

  • Process in c# to change commas not surrounded by quotes to pipe (|)
  • Break the file in 2: " and non-" files. This works only if the same field has "

You say you have no control over the format, but what you have is unusable...

gbn
Thanks for the quick help friends.Is there any other exclusive character to be used as delimiter?(this csv is already having |,*,$,^ as part of the data)I was thinking about using regular expression to replace the delimiters. what is the optimal way in C# for processing around 8Mb csv file?
nano
YOu can choose any character, even doubles or sequences. And sorry, I'm a SQL guy, not c#.
gbn
A: 

Hi, the Bulk Insert statement really sucks because it doesn't handle optional qualifiers.

The TextFieldParser class can help us clean up the file (Microsoft.VisualBasic.FileIO.TextFieldParser)

I have pasted in a function that uses the TextFieldParser class to clean up a delimited file so you can use it in a Bulk Insert statement.

String newDel = CleanDelimitedFile("c:\temp.csv",new String[] {","},"\t,\t");

Here is a function that will clean up your Delimited file.

    /// <summary>
    /// This function opens a delimited file and cleans up any string quantifiers
    /// </summary>
    /// <param name="FileFullPath">Full path of the delimited string</param>
    /// <param name="CurrentDelimiter">What string / character the file uses as the delimiter</param>
    /// <param name="NewDelimiter">What new delimiter string to use</param>
    /// <returns>Returns String representation of the new delimited file</returns>
    private static String CleanDelimitedFile(String FileFullPath, String[] CurrentDelimiter, String NewDelimiter) {

        //-- if the file exists stream it to host
        if (System.IO.File.Exists( FileFullPath )) {
            Microsoft.VisualBasic.FileIO.TextFieldParser cvsParser = null;
            System.Text.StringBuilder parseResults = new System.Text.StringBuilder();
            try {
                // new parser
                cvsParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(FileFullPath);
                // delimited file has certain fields enclosed in quotes
                cvsParser.HasFieldsEnclosedInQuotes = true;
                // the current delimiter
                cvsParser.Delimiters = CurrentDelimiter;
                // iterate through all the lines of the file
                Boolean FirstLine = true;
                while (!cvsParser.EndOfData ) {
                    if (FirstLine) {
                        FirstLine = false;
                    }
                    else {
                      parseResults.Append("\n");  
                    }
                    Boolean FirstField = true;
                    // iterate through each field
                    foreach (String item in cvsParser.ReadFields()) {
                        if (FirstField) {
                            parseResults.Append(item);
                            FirstField = false;
                        } 
                        else {
                            parseResults.Append(NewDelimiter + item);
                        }
                    }

                }
                return parseResults.ToString();
            }
            finally {
                if (cvsParser != null) {
                    cvsParser.Close();
                    cvsParser.Dispose();
                }
            }
        }
        return String.Empty;
    }

Victor http://www.eyecode.ca

Victor
A: 

Sadly, SQL 2005 and 2008 import XLS files much more smoothly than CSV files. I've never been anti-Microsoft but unless all the ANSI standards of database management are dramatically changing and the concept of a text qualifier is being abandoned (which I highly doubt), then this is probably a proprietary move by MS. SQL 2000 handled text qualifiers just fine (not sure about the BULK command as I've always just used the Import Wizards). Imagine my surprise when we migrated to 2005 and I had to rework all of my processes to NOT import flat files but instead import XLS. It only took me 16 hours (yes, TWO work days) to come to that conclusion and I actually lost sleep that week because I was so frustrated with MS for not allowing the use of Text Qualifiers (I even went into my bosses office to apologize for spending so much time on what should have been a 10 minute task). Ironically, you can't tell Excel to export anything withOUT including a double-quoted text-qualifier (or virtually any other software exporters for that matter). GRRRRRR.

The most frustrating part of all of this is that the SQL 2005 import wizard has a place to define the text qualifer!

...dare I say I'm starting to understand all the anti-M$ rhetoric after this experience!

Thisisfutile