Robocopy to back up pictures

Started by Polarigel, November 22, 2014, 02:10:37 PM

Previous topic - Next topic

Polarigel

To backup my data I use a robocopy-script that mirrors my harddisk to one of three different external disks I use in turn.
I keep the file date when writing metadata into the image file. The excellent iMatch help file mentions that this might be a problem for backup software.

Does anybody know if Robocopy can recognize image files with altered metadata?

I know, I just could try it- but maybe somebody has done that before and saves me the work :)

Ferdinand


ubacher

I just ran RoboCopy today and noticed that it copied files where I had written back metadata just before.
So the answer is yes - however I do nothing with the file date.

Mario

So the answer is yes - however I do nothing with the file date.
Writing metadata to files (or updating files by any other way) will change the "last modified" timestamp and the archive flag in the file system. Robocopy looks at the timestamp to determine if a file in source is newer. All backup programs work that way.

ColinIM

#4
You added your (typically concise!) reply Mario while I was compiling my book-sized reply!  And I think our answers differed only in that I didn't know for sure whether Robocopy always always used Last Modified dates+times (I do know that it's an extremely versatile program).  So I cautiously presumed that it could, feasibly, be given parameters which told it to rely only on the Archive attribute ... as I know some backup programs can be.

Quote from: Polarigel on November 22, 2014, 02:10:37 PM
(....) The excellent iMatch help file mentions that this might be a problem for backup software.
Does anybody know if Robocopy can recognize image files with altered metadata?
Different backup programs differ in how they decide whether a file is "due to be backed up" or "due to be RE-backed up".

       
  • Some rely on the Last Modified date+time of a file - and whether the Last Modified date+time has changed since the last backup was made.
  • Some rely on the state of the file's 'Archive' attribute. The file's 'Archive' attribute can (often optionally) be 'cleared' during a backup but a file's 'Archive' attribute will always be 'set' whenever a file is modified in any way.
  • Other backup programs can do a slow-but-certain byte-for-byte comparison, or a full CRC-check (or other 'hashing' comparison) between the previously backed-up file and the current version of the file, to decide whether it needs to be backed-up again.
Many backup programs allow us to use permutations of 1 and/or 2 and/or 3.

Whenever option 3. is employed as the test for "is the file due for backup", then the other two options are unnecessary. They'd be redundant.

I don't use Robocopy, but I'd predict with some confidence that even though you choose not to adjust the date+time on your files when you use IMatch to alter their metadata, that Robocopy could be told to heed the status of the files' 'Archive' attributes, and this could guarantee that they'd be backed-up OK ...

... in other words, your Robocopy backup could ignore option 1. above, and could rely instead on option 2. (Windows will always set our files' Archive attributes when we change their contents, even when we retain their previous Last Modified dates+times.)

However - relying on the Archive attribute scheme alone will only work if:

       
  • a. During every previous backup, the Archive attributes on all of those same files was cleared (to their 'not set' state).
  • b. And no other program or backup procedure cleared the Archive attributes on those files at any time between your alteration of their metadata and your Robocopy backup!
I'd suggest that these tight requirements (for keeping a careful watch on our Archive attributes whenever we later rely on them for backups) are one of the main reasons why most people allow the files' Last Modified dates+times to reflect any actual content changes - and it's why most backup programs presume by default that we'll include the Last Modified dates+times among the criteria to be checked when deciding whether backups are 'due'.

I use SyncBackSE instead of Robocopy for my own file-based photo backups (distinct from my whole-disk backups which is a separate process), and I always use option 3. from that list above (a CRC-type, byte-for-byte comparison). It's slower - sometimes awfully slow - but it's the most reliable IMO.

Colin P.

Ferdinand

I thought that sector-based programs like Trueimage looked at whether a sector has been modified, and therefore were independent of file properties.  They don't backup changed files, only changed sectors.

The OP didn't answer my why question.  My view is that EXIF date original and date digitised record when a file was taken, and if this information is so important then you can include it in your file naming schema.  The file date and time stamp tells you when a file was modified.  I think it's a misuse and misleading to try to maintain it as the time taken.

ColinIM

Quote from: Ferdinand on November 22, 2014, 02:22:28 PM
Why?

I thought it equally likely that Polarigel was either aiming to keep the files' original file creation dates+times intact, or that yes, maybe Polarigel was aiming to use those file dates+times as proxies for the photos' taken dates+times (which was something I'd attempted to do with my image files for some years). Surely it's technically 100% feasible to adopt this approach if that's what Polarigel chooses to do?  (Pending Polarigel's actual reply to your 'Why?' question ...)

Quote from: Ferdinand on November 22, 2014, 11:46:49 PM
I thought that sector-based programs (....) don't backup changed files, only changed sectors.
Yes, indeed this is true. But (if you're alluding to my "Option 3" above?) it's not only sector based backup programs which offer CRC or hash-based source-versus-destination comparisons.  (or have I misunderstood your point ?!  :P :-[ )

Quote from: Ferdinand on November 22, 2014, 11:46:49 PM
I think it's a misuse and misleading to try to maintain [the file date and time stamp] as the time taken.
I wouldn't go so far as saying it's a misuse or it's misleading. However, I will say that based on my own attempts to do a similar thing to what Polarigel appears to be doing (I had matched all my photos 'file dates' with their 'taken' dates, and worked hard for a few years to guard them against being changed afterwards by any other of my suite of software programs), Polarigel might decide - eventually - that this battle with dates+times is just not worthwhile!

I gave up just a few years ago, and I now let those Last Modified dates+times fulfil their primary purpose - as per their "last modified" label!!  But it took me a long while to develop the trust that I now have in getting ready-access to all my metadata using IMatch, and no longer to rely on a Windows Explorer column-sort to reveal those much-valued 'taken' dates on my photos. So I'd understand if that is what Polarigel is aiming to do with his file dates.

Colin P.

Carlo Didier

I am definetely on Ferdinands side concerning the misuse. As an IT professional, not changing the "File modified" attribute when a file has actually been modified is complete nonsense for me. A contradiction in itself and potentially very dangerous.

Polarigel

I didn't have time to visit the forum the few last days- and I'm stunned by the number and quality of the answers I got. Thank you for that!

The only reason for wanting to keep the file date is that it works for me. I'm sure, there are better and more logic ways to handle image files. But it will cost time and energy to find, test and optimize them- so why not stick to something that does the job?

QuotePolarigel might decide - eventually - that this battle with dates+times is just not worthwhile!
Maybe some day. But now I have my hands full organizing 20.000 photographs on my hard disk (and I didn't even start to scan my slides)

The iMatch help file states
QuotePreserve date/time of original file
If this option is enabled, ExifTool will retain the original last modified file system timestamp when updating files.
This option should only be enabled under very specific reasons because it may confuse backup applications and other software which uses the timestamp to determine when a file was last updated.
I enabled that option. Marios post and Colins excellent explanation helped me a lot to understand how to ship around the consequences:  If I want to use Robocopy, I have to rely on the archive attribute, which I never used. So I just have to do some test runs anyway. Until then I'll just make a complete new copy of all images processed as a backup.

cg

I know this is an old thread but I've spent quite a bit of time working on what I hope is a simple and reliable backup system for my photos that's run as a batch file and doesn't use commercial tools.

I ended up using the robocopy and keeping careful track of the Archive bit for changed files. Through a lot of testing and looking through various forums, I found that changing metadata in small amounts (not enough to change file size) may not trigger robocopy to back up the file, which is dangerous when dealing with the kinds of file changes iMatch does.

I also discovered that, as of this writing, the most recent version of robocopy from Microsoft has a bug in it that flags the file as changed but does not actually copy the file (!). So I had to track down the XP027 version of robocopy which is old enough to not have the bug, but new enough to have the feature to preserve folder timestamps, which I think is important.

I first copied all the files from the source to destination and then turned Archive bits off for both source and destination. Subsequently, each new file or file with new metadata write-back will have the Archive bit turned on, which triggers those files to be copied next time the backup is run. I set robocopy to reset the Archive bit when the file is backed up.

It may be a bit arcane, but I'm fairly secure in knowing that any file that iMatch writes from then on will be backed up when necessary, and I can monitor it as it's running to make sure.

axel.hennig

I've also used robocopy for my backups years ago. Right now my opinion is not to use robocopy as a backup-tool. Why?

I would use a tool which is designed to do backups. If you want to have it for free, look at my post here: https://www.photools.com/community/index.php?topic=12020.msg85615#msg85615. Other software-solutions are also mentioned in this discussion.

Mario

Copying files around is really not a real backup solution. To much work, to much can go wrong, verification of the backup is hard, it does not run automatically.

The best backup is one you setup once and then forget about it.
A backup that uses the Windows snapshot service to safely backup open files (many important files are always open and cannot be copied by normal means).
A backup that can verify the backup automatically.
Schedule different backups for different times (daily, weekly, monthly, with automatic retention and rotation etc.).
Something that can backup (even rotate) on external disks, NAS, cloud. With local encryption for data security.

I understand that spending 60 bucks on something like Macrium Reflect (I have 3 licenses) or TrueImage may be too much for many users (it's only backup, right?)
But Macrium has saved by butt several times over the past years (SSD died all of a sudden, hard disk died, Mario deleting the wrong files and noticing it two weeks later! etc.)
The price is fair, I have one license running on each of my PC's.

cg

Thank you for the suggestions.

I should clarify, I do also use Macrium for automatic full hard drive backups, with local and cloud storage of backups, etc. The local mirroring of the photo library from an SSD to a NAS is but one part of my backup strategy. I feel the local mirroring combined with NAS snapshots is a convenient way to revert to previous versions of files if something goes wrong somewhere (like if I mess up a bunch of metadata).

I think this has been mentioned in other posts, but, as I think people are saying, I try to keep different backup schemes to guard against different scenarios. Accidentally deleting or overwriting a file vs a wild fire vs ransomware vs being hit by a bus (to use Mario's example) may all call for different ways to store and restore data. So I do more than one.

Jingo

Quote from: cg on March 25, 2022, 05:39:41 PM
Thank you for the suggestions.

I should clarify, I do also use Macrium for automatic full hard drive backups, with local and cloud storage of backups, etc. The local mirroring of the photo library from an SSD to a NAS is but one part of my backup strategy. I feel the local mirroring combined with NAS snapshots is a convenient way to revert to previous versions of files if something goes wrong somewhere (like if I mess up a bunch of metadata).

I think this has been mentioned in other posts, but, as I think people are saying, I try to keep different backup schemes to guard against different scenarios. Accidentally deleting or overwriting a file vs a wild fire vs ransomware vs being hit by a bus (to use Mario's example) may all call for different ways to store and restore data. So I do more than one.

Using Macrium's differential backup strategy allows  you to grab individual backup files for as many days as you have differentials...  I keep 30 days of diffs before doing another full... in that way, I have 30 days to grab an overwritten file from a previous differential file.