What is a UniCode character that is least likely to be used by cameras?

Started by GrantRobertson, March 24, 2020, 11:46:19 PM

Previous topic - Next topic

GrantRobertson

I want to use that character as the separator between the segments of my filenames so I can guarantee the Relation Definition regular expressions will never confuse an original camera filename with my standard filename scheme which I will use to rename files.

Background:
I am designing the standard filename scheme that I want to use for all my files. I plan to rename all the files that come off my cameras to follow this standard scheme. As part of designing this scheme, I have been learning about the workings and limitations of things like the File Relations features in IMatch. I have established that I will need one set of Buddy/Version Relation Definitions for my files as they come off the cameras, and another set for the files after I have renamed them.

I have realized that, if I am not careful, it is possible for some camera that I may buy in the future to have files that just so happen to match the "Master Expression" in the Relation Definition for my renamed files. So, I realized, if I use a character in my file naming scheme that will never appear in filenames that come from the camera, then I can guarantee there will never be confusion.

Question:
Therefore, I want to choose a UniCode character to use as a separator between the segments of my filename scheme that would never appear in a file as it comes off a camera. This character should be something that can be typed from the keyboard so as to not be impossible to use manually. It should not be disallowed in any file system. It should preferably look like a separator. So, my first guesses are one of the following:  ~ ! # % + = . Does anyone have any other suggestions?


What I am NOT asking:
I am not asking for help in using any feature of IMatch.
I am not asking for help in designing my file naming scheme. How my file naming scheme is designed is irrelevant to this question.

I am just asking for opinions as to what would be a good character to use that will never appear in the filenames of files as they come off the camera from any manufacturer. What specific camera I have is irrelevant. I have no idea what cameras I may use in the future. I know that you cannot know what filename schemes camera manufacturers may use in the future. However, I am assuming you know what characters you have seen in the past. Usually, camera manufacturers do not get too very creative in this regard. I simply have not seen all possible schemes used by all the manufacturers.

GrantRobertson

Someone on Reddit lead me to the following document: http://www.cipa.jp/english/hyoujunka/kikaku/pdf/DC-009-2010_E.pdf, where on page 8 it specifies that the only character that should be used in camera file systems, other than letters and digits, is the underscore.

So, I guess I could use any of the characters I listed above.

Mario

There is no standard for file names and nobody knows that the folks in Japan can come up with.
The DSC (or _DSC for sRGB files) is in use for 20 years. But maybe they have another idea tomorrow.

Why don't you just use a unique file naming schema, with a prefix like GRS- or GP- or something? Unlikely that a camera vendor will ever use that.
Using arbitrary Unicode characters can break things badly. Not all file systems and platforms (Linux, Web) can deal with complex or non-ASCII file names. All pros use simple alphanumerical file names in the ASCII character set for very good reasons.

If your master files have a prefix like GRS they can easily identified with a regexp.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

GrantRobertson

For anyone else who read the full question and understood it: I have done further research (I did this research once, 16 years ago, but the available resources weren't as good as they are now) and eliminated a few of the characters in my original list.

Eliminated '+' character because that is a special character in regular expressions and is a reserved character in FAT12.

Eliminated '=' because it is a reserved character in FAT12.

Eliminated '%' because it is a wildcard in some situations.

Eliminated '#' because some Unix shells require it to be quoted or escaped. Besides, the regular public has "assigned" their own special meaning to "hashtags" that might confuse casual users of my files.


Which leaves only:

~ !

I know '!' is not allowed in very rare instances of old operating systems. Stanford's Best Practices does not recommend using them. However, I have been using them for decades in Windows with no problem, and they are allowed in all modern versions of MacOS and Linux as well.

'~' is not mentioned as a reserved character in any operating system and is only mentioned in Stanford's Best Practices as not recommended. However, I have serious suspicions that all the author of the article did was hold the shift key down and type all the numbers on the keyboard. We must remember that a very large proportion of all content on the internet is written simply to have some content there, without any real research. Even content coming from Stanford. Sometimes, especially content coming from Stanford.

Many references also discourage the use of spaces in file names. However, I have seen them used regularly in Windows, MacOS, and Linux. So I will continue to use them in my filenames. Specifically to separate words in the section of my filename used for the description.

So, I have decided to use the '!' as a separator in at least one part of my filenames. I will use '_' as a separator in other parts of the filename. I will use '~' to replace the digits that are unknown within dates. Yes, I know, this means I will need a slightly more complicated regex to reliably match these filenames.

Please, I specifically stated that I am not looking for advice in designing my file naming scheme. I have been naming files since about 1985 and have been doing so professionally since 1988. I was a network manager for 12 years and have designed dozens of file naming schemes for different purposes with no problems. In addition, I specifically stated that I wanted the characters to be allowed in major file systems, and that they be typable from the normal keyboard (which includes only ASCII characters) and are therefore not "arbitrary Unicode characters." As I stated multiple times, I only wanted to know if any of the characters I listed were likely to be used in file names by camera manufacturers. I even stated that I know we cannot know what the manufacturers will do in the future.

References:
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations
https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming

hro

Not sure what you are actually asking of this community.
But as you are obviously such an expert and specialist in file naming schemes I doubt that anyone in this community can match this. Good luck. Enjoy the fantastic IMatch.

thrinn

QuoteI have realized that, if I am not careful, it is possible for some camera that I may buy in the future to have files that just so happen to match the "Master Expression" in the Relation Definition for my renamed files. So, I realized, if I use a character in my file naming scheme that will never appear in filenames that come from the camera, then I can guarantee there will never be confusion.
Naturally, it is difficult to tell what characters will appear in filenames in the future. Whatever you choose, you will not get any "guarantee" that this character will never be used. Up to now, and this includes the move to "computerized" photography (smartphones), the filenames I have seen only contain ASCII letters, digits, underscores.

I understand that you are planning to rename all you images with respect to your personal file naming scheme. So the original name coming from the camera does not really matter, does it? If your concern is only that your version rule may accidentally identify some camera files as masters for existing files somewhere else in the database: Keep in mind that you can fine-tune rules to only look in the same (or specified) folder; or to match only file names with a specific length; or to match only file names with a specific prefix or suffix (as Mario suggested). Many users use a workflow where new files from the camera are first put into some kind of "Import" directory before they are renamed and moved into the target folder structure. This would guarantee that only files satisfying you naming scheme are part of you "real" folders.


Thorsten
Win 10 / 64, IMatch 2018, IMA

GrantRobertson

Quote from: thrinn on March 25, 2020, 08:54:00 AM
If your concern is only that your version rule may accidentally identify some camera files as masters for existing files somewhere else in the database: Keep in mind that you can fine-tune rules to only look in the same (or specified) folder;
...
Many users use a workflow where new files from the camera are first put into some kind of "Import" directory before they are renamed and moved into the target folder structure. This would guarantee that only files satisfying you naming scheme are part of you "real" folders.

I do, in fact use a folder structure similar to what you described. I have an "Ingest" folder with subfolders for specific stages of ingesting files. I still use subfolders instead of IMatch categories because other programs don't recognize IMatch categories, so it is simpler to just stick to legacy methods for separating files. I had read the section on limiting matches to specified folders, but came away believing one could only limit searches to folders above and below the current folder. So I hadn't realized I could set my Relation Definitions to only look in my "Ingest\Rename" folder for pre-renamed files and to look in folders other than "Ingest\Rename" for post-renamed files.

So, this is helpful.

I had already intended to make my regular expressions as specific as possible, to avoid mismatches, and wanted to add the "special" characters to reduce the probability even further. By limiting the searches by folder, it will completely guarantee no mismatches. That is, unless I have accidentally moved some file where it shouldn't be. So, I'm going with a belt, suspenders, and staple approach.  :)