tags:

views:

55

answers:

2

I generated a complete set of scripts for the stored procedures in a database. When I created a Mercurial repository and added these files they were all added as binary. Obviously, I still get the benefits of versioning, but lose a lot of efficiency, 'diff'ing, etc... of text files. I verified that these files are indeed all just text.

Why is it doing this?

What can I do to avoid it?

IS there a way to get Hg to change it mind about these files?

Here is a snippet of changeset log:

   496.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindCustomerByMatchCode.StoredProcedure.sql has changed
   497.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFindUnreconcilableChecks.StoredProcedure.sql has changed
   498.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixBadLabelSelected.StoredProcedure.sql has changed
   499.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOPL.StoredProcedure.sql has changed
   500.1 Binary file SQL/SfiData/Stored Procedures/dbo.pFixCCOrderMoneyError.StoredProcedure.sql has changed

Thanks in advance for your help Jim

A: 

Is your clone on a networked filesystem? If so, try a real filesystem

Are people editing/committing with different line endings? If so, choose a format and stick to it

http://selenic.com/pipermail/mercurial/2010-April/031242.html

http://selenic.com/pipermail/mercurial/2010-April/031248.html

ptomli
files are on a local file system. Windows XP
Jim Reineri
So far I am only user. These are the original files extracted from SQL Server. There have been no changes committed yet.
Jim Reineri
+3  A: 

In fitting with Mercurial's views on binary files, it does not actually track file types, which means that there is no way for a user to mark a file as binary or not binary.

As tonfa and Rudi mentioned, Mercurial determines whether a file is binary or not by seeing if there is a NUL byte anywhere in the file. In the case of UTF-[16|32] files, a NUL byte is pretty much guaranteed.

To "fix" this, you would have to ensure that the files are encoded with UTF-8 instead of UTF-16. Ideally, your database would have a setting for Unicode encoding when doing the export. If that's not the case, another option would be to write a precommit hook to do it (see How to convert a file to UTF-8 in Python for a start), but you would have to be very careful about which files you were converting.

tghw
tghw has the right answer it it's worth pointing out explicitly that "binary" and "text" files are handled identically by mercurial internally. They only differ in what merge tools they'll launch (which is easily configured) and what shows to users on diff/incoming/outgoing. The actual storage and merging is the same.
Ry4an
The problem was indeed Unicode coding. The db export only allows setting Unicode or ANSI. It did not give any more explicit choices for Unicode. I changed the output to ANSI and got the behavior that I wanted.
Jim Reineri
Thank you all for your assistance.
Jim Reineri