ansaurus

Question

Answer 1

+2 A:

In Win API I found quite nice function ReplaceFile that does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFile combo.

In general what you want to do is really good. And I cannot think of any better write scheme.

Michal Sznajder 2009-11-28 09:57:59

It would be even better if you illustrated that with the proper Python code that calls the MS library API.

RedGlyph 2009-11-28 11:30:44

I didn't realize ReplaceFile existed. Reading the docs, it appears to do a lot more than just rename. It maintains many of the attributes of the replaced file, so it seems designed specifically for this purpose.

Jason R. Coombs 2009-11-28 13:49:32

Answer 2

A:

Perhaps this helps http://stackoverflow.com/questions/489861/locking-a-file-in-python

mtvee 2009-11-28 09:59:06

I am not really concerned with multiple processes writing to the same file. But thanks for the link.

Rickard Lindberg 2009-11-28 12:27:32

Answer 3

+3 A:

A simplistic solution. Use tempfile to create a temporary file and if writing succeeds the just rename the file to your original configuration file.

For locking a file, see portalocker.

The MYYN 2009-11-28 10:07:09

If tempfile is created in another filesystem than the target one, then the final rename either won't work, or it won't be atomic.

ΤΖΩΤΖΙΟΥ 2009-11-28 11:36:15

But unless the rename is atomic, I risk loosing data, right?

Rickard Lindberg 2009-11-28 12:35:13

Answer 4

+2 A:

The standard solution is this.

Write a new file with a similar name. X.ext# for example.
When that file has been closed (and perhaps even read and checksummed), then you two two renames.
- X.ext (the original) to X.ext~
- X.ext# (the new one) to X.ext
(Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.

At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.

S.Lott 2009-11-28 11:56:57

But if you do not rename twice (creating a backup as you did) you risk loosing data if the rename is not atomic, right?

Rickard Lindberg 2009-11-28 12:31:00

Rename *is* atomic in many OS's. However, the backup is more important than hand-wringing about the atomicity of the operation. Remember: the odds of a crash (except in Windows) is very small. The odds of a crash in the middle of a rename (which is only a few instructions plus a sync is very, very small.

S.Lott 2009-11-28 13:21:38

Answer 5

+3 A:

If you want to be POSIXly correct and save you have to:

Write to temporary file
Flush and fsync the file (or fdatasync)
Rename over the original file

Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.

Notice that rename is not an atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.

Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!

With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.

The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.

Ext4 did break the rename expectation when it first came out, however heuristics were added to resolve it. The issue is not a failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.

So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!

kaizer.se 2009-11-28 12:28:01

Maybe I misunderstood. But if you do a rename on a POSIX system, aren't you guarantied that the destination is unmodified if the rename fails? "If the rename() function fails for any reason other than [EIO], any file named by new shall be unaffected." I guess it can still leave corrupt data then.

Rickard Lindberg 2009-11-28 12:41:23

the issue is a successful rename, and that just a rename does not guarantee atomicity of the whole operation.

kaizer.se 2009-11-28 14:14:19

What you mean is that rename() doesn't checkpoint. It most certainly is atomic, see: http://www.opengroup.org/onlinepubs/009695399/functions/rename.html

geocar 2010-01-07 03:10:44

Answer 6

+2 A:

If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.

Another way could work like this:

let original file be abc.xml
create abc.xml.tmp and write new data to it
rename abc.xml to abc.xml.bak
rename abc.xml.tmp to abc.xml
after new abc.xml is properly put in place, remove abc.xml.bak

As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.

Shailesh Kumar 2009-11-28 14:18:28

This is similar to S.Lott's answer with the addition to delete the backup file. It seems like it is the best way to do it. Thanks.

Rickard Lindberg 2009-11-28 14:26:28

I actually saw this implementation in the way ZODB (Zope Object Database) does the packing of its database file (Data.fs) ie removing of unused space of older transactions from the database file. The code is regular python code, packing is done in a temporary file and then the steps similar to the above are conducted. ZODB has been around for so many years and works well both on Windows and POSIX platforms, so I believe that this approach should work.

Shailesh Kumar 2009-11-28 14:30:14

Python cannot enforce the guarantee that rename be atomic. As far as I know, it is just calling the OS's system call.The procedure you give works well, though.

Tommy McGuire 2009-11-28 14:34:36

Answer 7

A:

Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.

ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
 LPWSTR,
 LPWSTR,
 LPWSTR,
 DWORD,
 LPVOID,
 LPVOID,
 ]

REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4

I then tested the behavior using this script.

from jaraco.windows.api.filesystem import ReplaceFile
import os

open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')

While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docs for details.

Jason R. Coombs 2009-11-28 14:27:58

Answer 8

A:

You could use the fileinput module to handle the backing-up and in-place writing for you:

import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
    # inplace=True causes the original file to be moved to a backup
    # standard output is redirected to the original file.
    # backup='.bak' specifies the extension for the backup file.

    # manipulate line
    newline=process(line)
    print(newline)

If you need to read in the entire contents before you can write the newline's, then you can do that first, then print entire new contents with

newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
    print(newcontents)
    break

If the script ends abruptly, you will still have the backup.

unutbu 2009-11-28 14:48:15

ansaurus

tags:

views:

answers:

How to safely write to a file?

related questions