Search function above Filewindow

Started by BenAW, January 28, 2014, 07:37:11 PM

Previous topic - Next topic

BenAW

Present Search only finds whole words or part of words starting from the beginning.
In eg a Document dbase a Search on part of the filename is very useful.

Propose to have a new group, "group:filename".
When this group is used, IM will search exclusively in the filename, but will also match any part of the filename.

( I assume that a search for parts of all other metadata will have a huge performance penalty)

Mario

The syntax of the search bar is fixed by the capabilities of the third party database engine product I use.
A group only for the file name would be possible, but since there is already  a file name filter which does exactly that (and with lots of options) I don't think I will add this.

The search bar is like a search engine or the search bar in Windows Explorer. It searches the current scope for all data containing the search term you enter. This usually brings good results very fast, even for large scopes with 1000 or 100,000 files. For more detailed searches or more control, use the Filter panel.

BenAW

Quote from: Mario on January 28, 2014, 08:45:44 PMA group only for the file name would be possible, but since there is already a file name filter which does exactly that (and with lots of options) I don't think I will add this.
I am aware of the filter capabilities, but I believe that for a document management system this would be usefull option.
Documents usually have a filename that says a lot about the content of the document.
To quickly find one document setting up a filter is a bit cumbersome.
Just typing any part of the filename is much simpler.

No high priority of course  ;)



Mario

We can search for file names using the search bar. I search for file names with the search bar all the time... my file names are numerical or use year/month/day schemes and this works great, even using Boolean operations.

Your request would just be useful in the case where:

a) You have uses the same terms (words) in both the file name and the metadata and
b) You only want to find files which have the term in the file name, but not in the metadata

There will be no performance gain here, just a way to restrict the search to the file names only. I'm not sure how many users would benefit from that. Maybe we can get some comments here.






BenAW

Quote from: Mario on January 29, 2014, 08:27:00 AM
There will be no performance gain here, just a way to restrict the search to the file names only.
My request is about the possibility to search for filenames using the "contains" functionality as in the filter.
So yes, restrict the search to filenames only, but in that case only use the "contains" search.
I assume doing searches in all metadata using "contains" will slow down things too much.
If not, my request would be to use "contains" for all searches from the search bar.

example: filename "NikonE5700usermanual.pdf"
no keywords in the file.

This file can not be found using "e5700" as search term.
You have to use at least "nikone5700*" to find the file.


Mario

This is cannot change. The search syntax is fixed by the search engine. It is amazingly fast (searching 100,000 files in about one second) but it comes with a strict syntax.

You can find your file by just searching for

Nikon*

or

NikonE5700*

The search engine works on words and it splits the incoming data (what IMatch feeds into it) into words using blanks and other characters. In your case, the file name consists of only one word:

NikonE5700usermanual

and hence the search engine cannot find usermanual or E5700. If the file name would be

Nikon E5700 usermanual

The search engine would index it with three words:

Nikon
E5700
usermanual

and can find it using all these words. This procedure is the same for all data indexed by the search engine.

But the search engine is work in progress and who knows what features will be added over time...

BenAW

I DO understand how the search works presently.

I also think that being able to find the file "NikonE5700usermanual.pdf" and other files related to the E5700 by just searching for "e5700" would be useful.
Suggest to keep the request open for later consideration.

ColinIM

I'll add a +1 here ... and hope for a future enhancement to the search engine.

I'm also a user of long descriptive filenames for my images at various stages in their workflow, and the 'interim' words or phrase-fragments that I employ in those extended filenames are rarely imported into any metadata field.  So I've made good use in the past of IM3's prowess in finding fragments of whole words among any portion of my files' names (as I presume all readers here will know).

Although it's off topic, I'll use the example of Windows Explorer's otherwise awful Search Bar (IMO) to hint at the functionality I'd like to see in a future IM5 search engine:

In Windows Explorer's Search Bar I can add a '*' prefix to my search term to fish out all instances of that search term in any part of the name of a file (or a folder) in the folders below the selected folder.  (And this works OK even with Windows indexing turned off.)  This "preceding wildcard" functionality is probably the only part of Windows Explorer's Search function that I like!!  But it's very useful to me.

Colin P.

Mario

I think the File Name Filter should handle this nicely. It supports all advanced search modes like is (precise match), contains, starts with, ends with and the inverse modes as well. It supports regular expressions to handle special cases, can search in folder names, ignore extensions, ...

I would not hold my breath for the database vendor to change the syntax supported by their full text index any time soon - which would be required to support the OT's or your feature request. Instead, use the File Name Filter when you want to find files with specific file names, and your search syntax is not supported by the search bar in the file window.

Ferdinand

The problem I have with the search bar is not its inability to do partial matches, but rather its insistence on splitting.  For example, it's often the case that I want to search for aaaa_bbbb and the search bar results includes some files that contain aaaa and bbbb separately but not aaaa_bbbb.

We've discussed this before, but it remains an obstacle to me using the search bar.  My recollection is that this was a decision by you rather than a limitation of the database syntax.  I hope that one day you will reconsider this.

Mario

The search engine splits phrases at certain characters: <blank> <line feed> <carriage return> and <->. I thing the underscore _ is not used as a word boundary. I  use the search engine tokenizer as is. No changes from my side.

Ferdinand

Is this behaviour something that could be changed, e.g. by a database syntax configuration setting?

Mario

I could write my own tokenizer and replace the default tokenizer used by the database system.
That's on my long-term to-do list because it has a lot of strings attached.

sinus

I am not sure, if I could follow here all postings (English-problems from my side).

I can only say, I use a LOT!!! the "Search for filenames" in IMatch3.

There I can simply write in a box, say "Hund" (dog) and IM3 finds VERY quickly in the WHOLE database (190'000 files) files like:

20060610-1517-059068-s-sin-hund-a_1_e.jpg
20110710-2223-089813-f-sin-bären-hundwil-hugentobler.sla
08 - Moore - Thunder Rising.mp3

or I can search for "_v1" (versions) and IM3 finds in the whole db very quickly files like:

20100216-1904-127190-s-sin-schachuhr-_v1c1.jpg
20120328-1744-183995-s-sin-weihermatt-aussen-u_v1k.jpg
20121105-1133-189822-s-gou-u_v1.jpg

So since IM5 is the "better" IM3, I think simply, that such searches I can do also with IM5.

If not, this would be a pity and I would support such a proposal.

Best wishes from Switzerland! :-)
Markus

Richard

So since IM5 is the "better" IM3, I think simply, that such searches I can do also with IM5.

Hi Markus,

As I read posts in this thread and many others, I get the impression that testers want IMatch 5 to work the same as IMatch 3 because that is what they are accustomed to. If "new" is to be the same as "old", then IMatch 5 would be a waste of time. Instead users need to get used to new ways of doing things. If you can use a File Name Filter to find all files with "Hund" in their name, will that not work for you?

sinus

Quote from: Richard on February 03, 2014, 12:09:41 PM
So since IM5 is the "better" IM3, I think simply, that such searches I can do also with IM5.

Hi Markus,

If you can use a File Name Filter to find all files with "Hund" in their name, will that not work for you?

Thanks, Richard, for your answer.
Well, I have a bout 200'000 images now, of course in different folders and on different drives.
As I pointed out, in IM3 I could simply write "Hund" in the seach-box, and say, after 5 seconds (GREAT) I have the results!

If this is possible with IM5 (what I think, it does), then I am happy. If I must lern a new route to get the same results, then it is fine, we have to lern always new things.

It would be only bader, IF this would not be possible with IM5 OR if would be more complicated (with several steps).

To be honest, I have to try this with IM5, but at the moment I have there only about 50 images (to test the version-stuff), hence it is not that good to test it.

But I will do this as soon as possible. I am sure, I can get the results with IM5, because with IMatch I could always get, what I wanted.

If this is not possible so easy in IM5, well, no problem, we have scripts  :D

So for me it is not really an issue.
Best wishes from Switzerland! :-)
Markus

Mario

Markus,

the search box above the file window searches 200,000 files in less than a second. It searches file names plus all metadata, unless you use a group to restrict the search to certain metadata groups. The file name filter is also very fast, and can even do a lot more than what IMatch 3 could do.

I'm not sure if you are asking a question "How to I search 200,000 files for all files with a file name containing _v1" of if you just posted a comment.

If you want to find all files in your database with a file name containing _V1

1. Select the database node to make put all files in your "scope"
2. In the file name filter, select the contains mode and enter _V1 in the edit field.

That's it. You can also search for file names starting with _V1, ending in _V1 that way. Or find all files not containing _V1.

sinus

Quote from: Mario on February 03, 2014, 02:10:22 PM
Markus,

the search box above the file window searches 200,000 files in less than a second. It searches file names plus all metadata, unless you use a group to restrict the search to certain metadata groups. The file name filter is also very fast, and can even do a lot more than what IMatch 3 could do.

Cool!  :D

Quote from: Mario on February 03, 2014, 02:10:22 PM
I'm not sure if you are asking a question "How to I search 200,000 files for all files with a file name containing _v1" of if you just posted a comment.

Sorry, to make it not clear: I was posting only a comment.  :)

Quote from: Mario on February 03, 2014, 02:10:22 PM
If you want to find all files in your database with a file name containing _V1

1. Select the database node to make put all files in your "scope"
2. In the file name filter, select the contains mode and enter _V1 in the edit field.

That's it. You can also search for file names starting with _V1, ending in _V1 that way. Or find all files not containing _V1.

But thanks anyway for your help, good to know, and yes, I am searching often for "_v1" or so, but I guess, in IM5 I will have to search not that often, because there I will have the fine version-system, natively!   :D :)
Best wishes from Switzerland! :-)
Markus

Ferdinand

A point of clarification. 
My objective is the same as others - I want to do a partial search on the file name.
In V3.6 this is fast and easy.
Richard is right that V5 is more powerful, which means different.
Different and more powerful however can sometimes be a little slower to use.
My workflow provides a solution - I already import the portion of the file name I want to search on into a property / attribute.
So this should work from the search bar.
The problem in V5 is that it contains an _ and so doesn't guarantee exact results.
So my solution will be to edit the property / attribute and change this character.
A change to the tokeniser would be preferable, but is a low priority.

@Mario - do you know the complete list of characters that the current tokeniser uses as a word delimiter?

Mario

Quote@Mario - do you know the complete list of characters that the current tokeniser uses as a word delimiter?
No, never looked that up. Can do if I have a minute.

BenAW

Quote from: BenAW on January 28, 2014, 07:37:11 PM
Present Search only finds whole words or part of words starting from the beginning.
In eg a Document dbase a Search on part of the filename is very useful.

Propose to have a new group, "group:filename".
When this group is used, IM will search exclusively in the filename, but will also match any part of the filename.
Back to the original feature request:
I understand that for the time being the WAY the search works isn't going to change.
I still find it useful to have a "group:filename" where the search is restricted to ONLY the filename.

My feature request thus becomes to add such a group  ::)

Mario

Try the photools.com group. It contains some "made up" tags IMatch maintains, including the file name and folder name. The other tags are mostly for ID3 files. This will limit your search to "file names mostly".

Mario

Quote@Mario - do you know the complete list of characters that the current tokeniser uses as a word delimiter?

I made up file names of two words:

baby atom
baby-atom
baby_atom

All file names were parsed into the two words "baby" and "atom" and I can find them by searching for either word.

A file name

babyatom

can only be found by baby but not by atom

Mario

PS.: Incidentally, I'm working on the search engine for the next update. I try to make it faster, and may even (likely) offer users a choice of what to index. Currently IMatch indexes a lot of data that is never queries and this blows up the size of the search engine data. I always meant to come up with a solution during the Beta but never found the time. I had this already in mind when implementing the Rebuild Search Engine command under Database > Tools.

Since we're now more towards the end of the Beta  :o ??? ::) it's about time.
As part of this change I'm switching to the latest version of the search engine which allows me to use an ICU tokenizer. Much better. ICU is a project once started by IBM which deals with all kinds of language- and location-specific issues.

The new tokenizer supports Unicode which is a big win for all non-English users. As a neat side effect, it allows me to specify which characters are to be used as separators to detect word boundaries. I'm not sure yet of what I can actually use (there is a document on the ICU site which explains about word boundaries.

But I guess for IMatch purposes it should be sufficient to use <blank> , . _ - ()[]{}<carriage return> <line feed> as separators.

This will not change the general behavior of the search engine, meaning we still cannot search for word fragments. All searches consider word beginnings so we can find "motorbike" by searching for "motor*" but not by searching for "bike". There are no "*bike" searches, unfortunately. We can find such data via the Filter Panel of course.

sinus

Quote from: Mario on February 04, 2014, 06:27:25 PM

The new tokenizer supports Unicode which is a big win for all non-English users. As a neat side effect, it allows me to specify which characters are to be used as separators to detect word boundaries. I'm not sure yet of what I can actually use (there is a document on the ICU site which explains about word boundaries.

That is very good!

Quote from: Mario on February 04, 2014, 06:27:25 PM

This will not change the general behavior of the search engine, meaning we still cannot search for word fragments. All searches consider word beginnings so we can find "motorbike" by searching for "motor*" but not by searching for "bike". There are no "*bike" searches, unfortunately. We can find such data via the Filter Panel of course.

So IM3 was different in this. Because IM3 finds very quickly fragments like "bike" in the word "motorbike".
But as I - and others - says, IM5 offers other good solutions.
Best wishes from Switzerland! :-)
Markus

Ferdinand

Quote from: Mario on February 04, 2014, 06:27:25 PM
But I guess for IMatch purposes it should be sufficient to use <blank> , . _ - ()[]{}<carriage return> <line feed> as separators.

Do we really need to have _- as separators?

Mario

Quote from: sinus on February 05, 2014, 09:53:45 AM
So IM3 was different in this. Because IM3 finds very quickly fragments like "bike" in the word "motorbike".
But as I - and others - says, IM5 offers other good solutions.
Sigh. No. IMatch 5 does the same in the File Name Filter. Just use IMatch the way it's intended to be used.


Mario

Quote from: Ferdinand on February 05, 2014, 10:42:17 AM
Do we really need to have _- as separators?

I think so. Many users separate words in their file names with _ or -. And at least the - is also a frequent separator in languages like German: we often write things like "Filter-Panel" and I want to search engine to pick up the two word "filter" and "panel". What speaks against using - and _ ? These are used already in the current version.

Ferdinand

Quote from: Mario on February 05, 2014, 10:49:25 AM
What speaks against using - and _ ? These are used already in the current version.

I've raised this several times before, but you were either unwilling or unable to change this.

I use _- when I want two strings to be viewed as one word rather than two, and so I don't use a space.  If you insist on _- as separators, how are we supposed to write "aaa_bbb" such that it is viewed as one word?  If people wanted "aaa_bbb" to be viewed as two words, why wouldn't they just write "aaa bbb"?  This is not Windows 3.1 anymore - spaces are allowed and so should be used if that's what you really mean.

sinus

Quote from: Mario on February 05, 2014, 10:46:57 AM
Quote from: sinus on February 05, 2014, 09:53:45 AM
So IM3 was different in this. Because IM3 finds very quickly fragments like "bike" in the word "motorbike".
But as I - and others - says, IM5 offers other good solutions.
Sigh. No. IMatch 5 does the same in the File Name Filter. Just use IMatch the way it's intended to be used.

I will do so. Thanks! And now I checked it  :-[

What in IM3 was the "Search for File Names", is in IM5 the "File Name Filter". Fine. And it works!
Best wishes from Switzerland! :-)
Markus

sinus

Quote from: Ferdinand on February 05, 2014, 11:05:43 AM
Quote from: Mario on February 05, 2014, 10:49:25 AM
What speaks against using - and _ ? These are used already in the current version.

I've raised this several times before, but you were either unwilling or unable to change this.

I use _- when I want two strings to be viewed as one word rather than two, and so I don't use a space.  If you insist on _- as separators, how are we supposed to write "aaa_bbb" such that it is viewed as one word?  If people wanted "aaa_bbb" to be viewed as two words, why wouldn't they just write "aaa bbb"?  This is not Windows 3.1 anymore - spaces are allowed and so should be used if that's what you really mean.

Sorry, Ferdinand, if I not understand.
But why do you not write then "aaabbb", if you want one word? Sorry, I asks only for curiousity. When you mention, that this is not more Win 3.1, so we can use nowadays:

Hotdog
hot dog
hot-dog
hot_dog

or not? And if I now search for these words, I have simply to add the exact same term to find them.

For example, if I wrote "hot_dog", a search for "hot" or a search for "dog" should find it, but a search for "hotdog" should NOT find it.

I am sure, I do not understand this here correctly, but I feel, it could be important for me. Hence I want understand it  :)
Best wishes from Switzerland! :-)
Markus

Richard

What about a "No-Break Space" (Alt+0160) as in "aaa bbb"? I have not tested this but it should work.

BenAW

#32
Just played a bit with the new search in build 140.
It more than fulfils this feature request.
Imo the request can be considered solved.

Mario

Very well. That's what I've thought  :)
I'll move to this to solved.