views:

622

answers:

2

Hi!,

I´m having trouble with bulk insert a file that contains a field in data type text, and it holds enters and the delimiter for the fields are pipes "|" and the row terminator is "|\n"

I get an error of truncation for the fields next to the text field. I think that the bulk insert thinks that the enters in the second field are the next rows.

DECLARE @sql varchar(2000)
PRINT 'xyz table'
SET @sql = 
'BULK INSERT xyz 
   FROM ''\\' + @@servername + '\Import\xyz.txt''
   WITH 
      (
                DATAFILETYPE   = ''char'',
                FIELDTERMINATOR         = ''|'',
                ROWTERMINATOR           = ''|\n'',
                TABLOCK,
                ROWS_PER_BATCH         = 2000,
                MAXERRORS      = 1000

                )
'
PRINT @sql
EXECUTE (@sql)
PRINT ''
GO

This is the data example

467507**|**08-20-09\
[8:55:03 AM] *** miked@wkaiser-6793 is Not Available [Ext. away]\(CR)
[9:00:57 AM] *** miked@wkaiser-6793 is Online [Online]\(CR)
[9:01:00 AM] <jeffp> Howdy.  Time slip update game!  Update TS#467493 & 467507 with a (CR)comment and see if you win! [9:01:30 AM] *** miked@wkaiser-6793 is Away [Away]\
\(CR)
08-18-09\(CR)
[10:13:15 AM] *** miked@wkaiser-6793 is Online [Online]\(CR)
[10:13:59 AM] <jeffp> Howdy; welcome back from lunch..  Just pinging you for an update in TS#467493 & 467507.  Since 467493 is pri9, want a stock e-mail to go out to the customer to get them moving?\(CR)
[10:14:47 AM] <miked@wkaiser-6793> thats ok i got it\(CR)
[10:14:53 AM] <jeffp> Aiight.**|**2009-08-18 00:00:00**|**2009-08-20 00:00:00**|**JDP**|**JDP**|**

The table schema is:

create table xyz
  (
    xyz_id VARCHAR(200), --INT TO VARCHAR
    notes text,
    create_date varchar(32), --DATETIME
    create_user varchar(12),
    modify_date varchar(32), --DATETIME
    modify_user varchar(12)
  )

As you can see (well, not can see) there are enters (marked with CR) and the bulk insert is confusing them with a new row.

Any help will be appreciated.

Thanks

+1  A: 

The data is inconsistent. You have some rows ending \(cr) with one field, other (cr) ending with | column separators

Either that or you have inconsistent column separators, some \(cr), some |. If so, then you'll need a format file to deal with each separate "column".

gbn
+1  A: 

EDIT: you don't need a format file:

IF OBJECT_ID('tempdb..#rawdata') IS NOT NULL DROP TABLE #rawdata
create table #rawdata (
  xyz_id VARCHAR(200)
, notes text
, create_date varchar(32)
, modify_date varchar(32)
, create_user varchar(12)
, modify_user varchar(12)
)

BULK INSERT #rawdata 
FROM 'C:\temp\file.txt'
WITH (
    FIELDTERMINATOR = '**|**'
  , ROWTERMINATOR = '**|**\n'
  , TABLOCK
  )

SELECT * FROM #rawdata

You may need to play with **|**\n (LF) vs **|**\r\n (CRLF) vs **|**\r (CR), depending on whether the file is Unix, DOS or MAC-like.

/EDIT

It can't be done without a format file. So, create a format file:

8.0
7
1 SQLCHAR 0 0 "**|**" 1 xyz_id        ""
2 SQLCHAR 0 0 "**|**" 2 notes         ""
3 SQLCHAR 0 0 "**|**" 3 create_date   ""
4 SQLCHAR 0 0 "**|**" 4 modify_date   ""
5 SQLCHAR 0 0 "**|**" 5 create_user   ""
6 SQLCHAR 0 0 "**|**" 6 modify_user   ""
7 SQLCHAR 0 0 "\n"    0 omitted       ""

Then:

IF OBJECT_ID('tempdb..#rawdata') IS NOT NULL DROP TABLE #rawdata
create table #rawdata (
  xyz_id VARCHAR(200)
, notes text
, create_date varchar(32)
, modify_date varchar(32)
, create_user varchar(12)
, modify_user varchar(12)
)

BULK INSERT #rawdata 
FROM '\\folder\file.txt'
WITH (
    FORMATFILE = '\\folder\file.fmt'
  , TABLOCK
  )

Or, in SQL 2005+:

SELECT * FROM OPENROWSET(
    BULK '\\folder\file.txt'
  , FORMATFILE = '\\folder\file.fmt'
  ) a
Peter
Thanks Peter!!, I will try it right away. you example is more complete and accurate than the documentation I reviewed.
Arturo Caballero
Here's my usual reference: http://msdn.microsoft.com/en-us/library/ms191479.aspx
Peter
Peter, I tried the solution but I get the same error.I modified the "\n" to "|\n" in the format file.I get this error:Server: Msg 4866, Level 17, State 66, Line 1Bulk Insert fails. Column is too long in the data file for row 1, column 1. Make sure the field terminator and row terminator are specified correctly.Server: Msg 7399, Level 16, State 1, Line 1OLE DB provider 'STREAM' reported an error. The provider did not give any information about the error.OLE DB error trace [OLE/DB Provider 'STREAM' IRowset::GetNextRows returned 0x80004005:
Arturo Caballero
That means it can't find a delimiter and just keeps reading till it fills the buffer. What was the error message before you changed the format file? I'd try a few things: 1. copy a hundred rows out into a separate file and try to make that work first. Also, use SELECT FROM OPENROWSET(....) to test the format file in isolation. If you can get that working, you have narrowed down the issue considerably.
Peter
I used your sample text for testing, copy and paste into a text editor, multiply the rows, etc. That *did* work, using all the methods shown above.
Peter
What kind of system generated the file? CR usually means carriage return, decimal 13, 0x0D, or "\r", not "\n". "\n" means line feed, decimal 10, 0x0A. Windows systems generate \r\n line endings, while Unix systems generate \n. If you have a hex editor, you can open the file up and see exactly what is there. \n would be 0A, \r would be 0D, and \r\n would be 0D 0A. If it is inconsistent between rows, you will have to find some other way to identify new lines.
Peter
It was generated by an Informix DataBase on Unix. I have no controll on that. :( I can ask to change the delimiters only.Thanks a lot Peter
Arturo Caballero
With your process the import worked fine.Thanks a lot budy!
Arturo Caballero