Quicker search with a clever Naming-system

Started by sinus, June 29, 2016, 09:24:20 AM

Previous topic - Next topic

sinus

Hello
Just to say, because I must just now search different images in IMatch.
While IMatch offers a lot of search-possibilities and filtering, I think, the most quick way is really the "Search only for file-names".

On my system with about 210'000 files, I find a simple word in 1-3 secondes, searching ALL files in the DB.
If I search only Description, Title and Keywords, for example, this search is about 20 seconds.
Searching for all metas runs abouot 45 Seconds (what is still quick).

This means for me, that I can search a (unique) filennumber, a date, a time, a client or an event or a person over my whole DB and it takes about 1-3 seconds.
I can also search in the same quick time, all versions, all masters, all files with "heavy edition" in Photoshop, all websized images, all images, what have a white background and so on.

This is really great, I think.

I write this only as a short input, that you should think about a clever file-naming-system.
Do you want include your client or private names into the filename?
Do you want give a word for the main-event for the file-name?
Want you have a special letter for versions, for websized images and so on...

This means do you want short or long filenames. Or something in the middle.

I decided, years ago, for long filenames.
Sometimes this is a backdraw, because some boxes does cut then the filename.
But it has of course also a lot of advantages, including that you can identify files also outside of IMatch.

My filenames are quite long, with all the advantages I meationed, and some minor disadavantages.
They looks like this:

20110616-1541-151223-s-kun-elux-bauerei_a.nef
20100521-1433-130533-s-coo-biobauernhof_m_v1.jpg
20060815-0942-062160-s-ref-stephan-gruber_o.jpg
20080212-2333-082312-f-pri-schachuhr_o.jpg

So the only sense of this posting is, I want encourage you, to think about a clever, good filenaming-system for you.
And make it consistent, use stubborn alsways the same system.
If you do so, then you can easy change a system, if you want change something in your workflow. (what I did also sometimes, some minior changes).

(I did not really know, where I should put this small post, hence I wrote it here).








Best wishes from Switzerland! :-)
Markus

zematima

Hi:
Can you tell us please the meaning of the filenames?
The first 8 digits are YYYYMMDD (I guess) but what about the others?
Thanks in advance,
JRosa.


Mario

QuoteIf I search only Description, Title and Keywords, for example, this search is about 20 seconds.
Searching for all metas runs abouot 45 Seconds (what is still quick).

It will be faster when you search for the 2nd time, because then these parts of the database will be in the file system cache. Still, ...


I could bring down these search times to less than one second with a special kind of index (full-text index).

We had this search option available in the early beta versions of IMatch 5, but dropped them for two reasons:

1. Users did not accept the results, or were confused by them.
2. This kind of index is super-fast when searching, but slow to update.

1.

The full-text index can search the database blazing fast (maybe 1 second to search all 100 frequently used tags in 200,000 files) but it can only perform postfix searches.
This means it can find complete words like "bar" or word beginnings "bartender" "bart", "bartleby" "bar fly" when searching for "bar*".
It will not find "beachbar" or "foobar" "wunderbar" when searching for "bar*" because in these texts the bar search pattern is not a word beginning.

Most users expect to find all these occurrences when searching for "bar".

2.

Full-text indexes are designed to be super fast to search. They are rather slow to build or update.
This is not a problem for a search engine which updates its index once a week by revisiting all sites.
But it is a problem for an application like IMatch, where every rating or label update, keyword change, metadata change etc. invalidates the index and requires a partial or complete rebuild. If you add a new keyword to a file, you expect (correctly) that you can search and find it immediately - without waiting 10 seconds for a full-text index to update...while IMatch is unusable because the database load is at 100%.

I have made a number of experiments to overcome both the 'update lag' and the inability to search partial words. I have ideas for the partial words (but which would create a much larger and thus slower index) but not for the update problem.

Maybe this is something that can be used in IMatch Anywhere, where databases are read-only and hence, once build, the index does not need to be updated all the time...

sinus

Quote from: zematima on June 29, 2016, 09:31:47 AM
Hi:
Can you tell us please the meaning of the filenames?
The first 8 digits are YYYYMMDD (I guess) but what about the others?
Thanks in advance,
JRosa.

Of course, JRosa,

means:
20080212-2333-082312-f-pri-schachuhr_o.jpg

20080212-2333 means date-time: 12.02.2008, 23.33 hour
082312 is a unique, 6-digit number, created by a IMatch-variable, what counts upward automatically.
f This is from where the files comes, like from internet, from my camera, from strangers ...
pri or kun is for what files are, like private (pri) or for what client and so on.
schachuhr or stephan-gruber is the short description of the event. Like x-mas or birthday or so...

The last part, _a or _m or _o are has special meanings for things like
_m master
_v versions
_v_f version with white background

and so on.
I thought a lot about this last part, but I think, for me this is now a good thing.
Means for exmple, a raw - file (nef in our case) has two possible endings:
_a (not used)
_m (master)

If I use then for example a unused file, with _a, I change it to _m.
This is maybe not necessary, but becasue it is easy to change such a ending with IMatch, it gives us a lot of possibilities to search for and to distinguish.

As a last remark, the letters - and _ are also important, like the endings has always an underscore _ while others has only a -
This makes some letters or numbers unique.
For example the combination _m will only be used for endings for masters (what have versions).
A version from such a master would be as this:

20100521-1433-130533-s-coo-biobauernhof_m.nef (master with versions)
20100521-1433-130533-s-coo-biobauernhof_m_v1.jpg (version 1 of the master)

Maybe this all sounds complicated, but when all files have the same logic, it is very quickly clear and the big advantages comes into play, if we search, copy or whatever.

Hope you understand my long answer  :)


Best wishes from Switzerland! :-)
Markus

sinus

Quote from: Mario on June 29, 2016, 10:49:57 AM
QuoteIf I search only Description, Title and Keywords, for example, this search is about 20 seconds.
Searching for all metas runs abouot 45 Seconds (what is still quick).

It will be faster when you search for the 2nd time, because then these parts of the database will be in the file system cache. Still, ...

Right, my times here on my system are measured with the second round, the first is longer.

Quote
1.
The full-text index can search the database blazing fast (maybe 1 second to search all 100 frequently used tags in 200,000 files) but it can only perform postfix searches.
This means it can find complete words like "bar" or word beginnings "bartender" "bart", "bartleby" "bar fly" when searching for "bar*".
It will not find "beachbar" or "foobar" "wunderbar" when searching for "bar*" because in these texts the bar search pattern is not a word beginning.

Most users expect to find all these occurrences when searching for "bar".

Yes, I would agreed with this, I would also expect it like most users does.

Quote
2.
Full-text indexes are designed to be super fast to search. They are rather slow to build or update.
This is not a problem for a search engine which updates its index once a week by revisiting all sites.
But it is a problem for an application like IMatch, where every rating or label update, keyword change, metadata change etc. invalidates the index and requires a partial or complete rebuild. If you add a new keyword to a file, you expect (correctly) that you can search and find it immediately - without waiting 10 seconds for a full-text index to update...while IMatch is unusable because the database load is at 100%.

Also here I think like you write, I would also expect to see the result immediately.

Maybe IMatch will be even quicker once, but for me it is very good in speed, of course speciall the filename-searching, and this is what I often use.
Thanks to our long file-naming-system.  ;D
Best wishes from Switzerland! :-)
Markus

sinus

Quote from: Mario on June 29, 2016, 10:49:57 AM
I could bring down these search times to less than one second with a special kind of index (full-text index).

BTW, Mario:
I searched now in the filename-search in all 230'775 files the term "_f" (for white background or cut-out, freigestellt oder weiss), and the result was

3'236 files in 1,1 secondes (out of 230'775)

I think, this is fantastic!  :)

Best wishes from Switzerland! :-)
Markus

Mario

Quote from: sinus on June 29, 2016, 12:05:09 PM
BTW, Mario:
I searched now in the filename-search in all 230'775 files the term "_f" (for white background or cut-out, freigestellt oder weiss), and the result was
3'236 files in 1,1 secondes (out of 230'775)
I think, this is fantastic!  :)
IMatch keeps the file names in memory and hence can scan them very quickly. This won't work with metadata, which may require hundreds of Megabytes of RAM (plus a long time for loading).

medgeek


zematima


sinus

Quote from: medgeek on June 29, 2016, 01:28:23 PM
Markus' post reminded me of Peter Krogh's recommendations:

http://www.dpbestflow.org/file-management/file-naming

I am in the same boat like Peter Krogh wrote.
Except this here:

Avoid incorporating job names or descriptions in file names

Although you can do this, it is easy to run into an overly long file name using this approach. Another consideration is that if you do a lot of shoots for a particular client or at a particular location, you'll have to use some other naming string to differentiate the shoots from one another, so the descriptive component of the name is not particularly helpful.


The question is, what are long filenames. For me personally filenames like I use:

20110112-1617-221650-s-pri-buero-lux_a
20160603-0914-287170-s-ref-kzimmermann_m
20160613-1456-289217-s-coo-crevetten_m
20160625-1331-292838-s-coo-turnfest_m_v1


These are quite long filenames, but not too long (for me).
And distinguish events for the same client is very easy.
And finally, like this post was intendent for this, what Peter left out of his consideration: the search is very quickly and specialy in this cases a description of the event is very helpful.

And I prefere such filenames more than for example:

IMG_20150810_200856
DSC5919
f-2016-45129

But like anywhere: every user can do it like she/he wants to do.  ;D




Best wishes from Switzerland! :-)
Markus

ubacher

Using file names to identify the content of images is the way to go when you do not use Imatch or the like.
For identifying the source of the image a category seems much cleaner. Similarly to distinguish private or business.
By putting the clients name into the file name sinus avoids having a (sub) category for each client.


Having the date in the file name is what I recommend if you organize your physical storage of files
in an hierarchical manner. This way you can locate a file's location without a computer search.
Adding post-fixes for versions is of course necessary to distinguish versions yet link them to the master through the root of the file name. This is probably the most common way since it allows you to make the distinction at the time it is created i.e. when writing it out of photoshop.

As to finding/identifying a photo-shoot I use the following system:

I  have a category I call Index to photo shoot which I assign to a key image of the each shoot.
I can thus select category client and filter for category  Index to photo shoot to have a quick pictorial overview of the various shoots for the client. Since I have each photo-shoot in its own folder it just takes a Ctrl-G
to go to the files of the selected shoot.

It all comes down to finding the most fitting system for your mode of operation. The flexibility of Imatch makes
it possible.
Unfortunately, legacy systems (how it was done in the past, how one is used to operating....) often constrain the choices.


sinus

Quote from: ubacher on June 30, 2016, 07:09:41 AM
Using file names to identify the content of images is the way to go when you do not use Imatch or the like.
For identifying the source of the image a category seems much cleaner. Similarly to distinguish private or business.
By putting the clients name into the file name sinus avoids having a (sub) category for each client.

I agree mostly.
But using a short "key-name" for the event in the filenames does of course does not hinder, if I want have categories too.
And I can for example use the filename for DataDriven categories, very handy and automatically.

But the main reason for me, is, that I know, what this is a file, if I copy it, move, backup and so on, inside IMatch and outside.
And, like my title of this post, I can seach VERY quickly for an event or a person or whatever. I think, if you have 230'000 files, nothing is that quick to find your images like search for the filename.
Although I use also a lot of categories.

If I have found the image with the search of IMatch, I hit for example a key (a script) and all images will puts into several categories, distinguishes between masters, versions, info-master, delivered images, not used images, interesting images and so on.
I have never thought, that this is possible, but IMatch makes it possible.  :)

And a possibility, what I use also quite often: if I browse throug files, and I see an interesing one, I hit a key (script) and I have a box, where I can search for:

- duplicates
- files from the same day
- from the same hour (or minutes)
- from the same client
- from the same event
- all versons (from this event)
- all masters

... and others.
IMatch have some of theme native, to use a good search.

But specialy I use quite often a search for an event, and for this I like really having this information in the filename.

But of course you are right, ubacher, we have a lot of possibilities, using categories and all the fine tools of IMatch to manage our files.

At least manage and finding the wished image is the main task of a DAM, and how we do this, can every user choose.

My way to manage files would be possibly too "fine-grained" for another user.
One want and must not have all the bells and whistle, while another one love it.  ;D
Best wishes from Switzerland! :-)
Markus

sinus

For me, my file-naming-system works really very good.
For a reminer, it is like this:

20110112-1617-221650-s-pri-buero-lux_a
20160603-0914-287170-s-ref-kzimmermann_m
20160613-1456-289217-s-coo-crevetten_m
20160625-1331-292838-s-coo-turnfest_m_v1


I have now created a script, what does the following things automatically, depending on the file-name, the relation (master-version), the addings at the end (_xx) and the the extension.
It does this automatically, if I select some images:

- add a label for the client
- add a category for the client
- add pins, depending if delivered or not
- add other pins, if a master or version
- dot, if it is private

Such things are mostly only possible, if we have a consistent file-naming-system. And of course we use IMatch also!  ;D

BTW: uh, I fear a bit, if one day maybe Visual Basic does left IMatch
:o
Best wishes from Switzerland! :-)
Markus

Mario

QuoteBTW: uh, I fear a bit, if one day maybe Visual Basic does left IMatch

This day will come. In a not too far distance.
Basic is an ancient programming language and the version of WinWrap (not Visual) Basic I use in IMatrch is also getting very old. Upgrading it to a newer version would be financially ruinous.


The technology I have developed for IMatch AnywhereTM allows access to IMatch DAM services from any programming language that can utilize RESTful web services, e.g. JavaScript and HTML in a web browser.

When I make the switch from Basic to the open IMatch WebService interface, programming will become richer and more powerful. And easier, because you can rely on the powerful debugging and test tools available in all web browsers. You can immediately use any of the powerful JavaScript toolkits and frameworks out there. And there are thousands of easy tutorials on JavaScript programming out there.

A workflow script that looks at a file name and then performs operations like setting a label, a dot, adding some categories etc. is as easy to write in JavaScript as it is in Basic. Probably easier. IMatch WebServices then do the rest.

sinus

Quote from: Mario on July 06, 2016, 08:58:03 AM
...
A workflow script that looks at a file name and then performs operations like setting a label, a dot, adding some categories etc. is as easy to write in JavaScript as it is in Basic. Probably easier. IMatch WebServices then do the rest.

Thanks, Mario, I am "lerning" a bit parallel to "code" JavaScript.
And your sentence above gives me hope.  :D

But I have also to say, without your examples in your helpfile and your help her on the forum (like some days ago, how to change a filename) I would have never such "good" scripts, what I use.
Thanks to your "basic - help" a lot is possible.

Sometimes on the net I can find complicated, sophisticated example for programming. But often I (and also other users) must have some basics, like how change a filename, how add a category and so on.
If I have this basics, then I am able to expand this and write a bit better scripts.

And that is, what I love from you and IMatch, there are also simple examples.

It is a bit like lerning driving a car.
If a very good driver shows me how to drive loopingss, makes great lookings movements, stops and move and so on, I will be astonished, but it is not that helpful.
But if a driver shows me slowly, where I have to put my feets to give gas or break, then this will help me most. Because lern better and better takes time, but for this I must have the basics.

So, I am looking forward, that you also does help your users, when chaning to a new language. Like you did always and I guess, you will do.
Thanks for this. Really.  :)
Best wishes from Switzerland! :-)
Markus

Mario

When I swap Basic with IMWS in IMatch at some point in the future, I will of course document it.

Tip: When you select at the Discover IMatch WebServicesTM command in IMatch WebViewer, it opens the built-in documentation. This documentation explains every available method with all parameters, and you have even a live sample for most, which you can click to try the command out with your currently loaded database.

The results are always JSON objects you can immediately access and use in JavaScript - or just look at them in your web browser. Give it a try. You can play with the examples by changing the parameters in the address bar in your web browser.

Jingo

Quote from: Mario on July 06, 2016, 11:08:22 AM
. Give it a try.

I assume this is still only available for select Beta testers?   8)

Mario

Quote from: Jingo on July 09, 2016, 07:23:45 PM
Quote from: Mario on July 06, 2016, 11:08:22 AM
. Give it a try.

I assume this is still only available for select Beta testers?   8)

You are right. A couple of weeks back I posted a message here in the community and then formed a closed tester group from the IMatch users who replied.
Things look sparkly and the infrastructure I need on the server-side to handle the licenses and downloads for IMatch Anywhere is in place since yesterday.

As soon as I've migrated everything to the new server farm (end of August) I can start shipping IMatch AnywhereTM and IMatch WebServicesTM.
Currently I'm working again on the next regular IMatch for Windows update.

Jingo