Counting tagged people to create categories depending on number of people

Started by blackhead2, November 25, 2017, 12:41:51 PM

Previous topic - Next topic

blackhead2

Hi all,

as mentioned in this thread (https://www.photools.com/community/index.php?topic=7269.msg50492#msg50492) I use "@CatDistinct[Max;@All|@Keywords|People]" to find all pictures that show only Max as person wihout any other tagged person on the picture. Now I want to go a step further by finding also pictures showing exactly two specifc people.

In Lightroom I used a workaround to achive that by creating a smart collection with a rule "contains Max AND contains Moritz AND NOT contains ListOfAllOtherPersons". The list was a list containing all other persones I exported from the keywords. The disadvantage of this workaround is that I have to change the list with every smart collection and once I tagged a totally new person I had to add this to all existing smart collection to get correct results.

So my first question is: Has IMatch a "built in" solution to find a picture containing exactly two specific persons? I guess CatDistinct is not made for this use case.

If there is no built in solution, is there some possibilty to count the people tagged on a picture and write the value to the metadata? I assume that's only possible with scripting, right? With this additional information it would be easily possible to create the collections I need.

Thank you in advance!

Regards,

Jens

Mario

I'm not sure what you are trying to achieve.

Do you want to find images with exactly any two persons?
Or images of Jill and Joe?  This can be done easy by the category formula "Jill" AND "Joe" of course. Or by drag and drop using the Category @Builder.
The typical solution for this kind of challenge is to maintain a separate category hierarchy with counts, for example:

Counts
  |- Persons
    |- Single
    |- 2
    |- 3
    |- ...
    |- Group
    |- Many


If you assign files to the appropriate groups you can later use that in category formulas, data-driven categories, the filter panel etc. as a criteria.
Customers also use this pattern to count cars, boats or other objects on images. Depends on your needs.

IMatch does currently not do face detection and thus cannot 'know' that an image shows two people or maybe just a dog and a duck.

If you need this in metadata, you can pick one of the free text input XMP tags. And then maybe make a data-driven category from that tag so you can quickly filter on it or use it in category formulas.

blackhead2

Thank you for your quick reply.

QuoteThis can be done easy by the category formula "Jill" AND "Joe" of course.

Yes, of course, but this also returns all pictures with Jill, Joe and all other persons that are on the pictures. But I want only to see the pictures that contain Jill and Joe exclusively.

So I guess I have to create a script that counts the number of entries in the hierarchical subject tag that contain "people" and add this information to the metadata. Then I can create the hierarchy you mentioned:

Counts
  |- Persons
    |- Single
    |- 2
    |- 3
    |- ...
    |- Group
    |- Many

As I already tag the persons manually my goal is to add the "number of persons" automatically once I finished tagging. Also for the pictures I tagged over the last years I don't want to add the additional info manually...

Mario

No automatic object or person detection included in IMatch.
Not even the Google cloud or Microsoft's Visual Science tools can do this reliably (yet).

You can combine multiple @CatDistinct formulas via AND...?

blackhead2

Just to clearify: I don't need any automatic people detection. I tag all persons manually. What I want to achieve is to count the number of persons I have tagged in each picture.

So I want to count the entries in the "hierarchical subject" tag that start with "people|..." and add the number as additional keyword to the file. I will try to write a script for that.

Quote from: Mario on November 25, 2017, 02:00:11 PM
You can combine multiple @CatDistinct formulas via AND...?

Have you ever used that? From logical point of view that should always return no files, isn't it?

Mario

Ah, counting the numbers of categories below a given parent for external purposes.

You can use the {File.Categories} variable with a filter and count for that:

{File.Categories|filter:^@Keywords\|Persons;count:true}

This assumes that you people names use the People|<Name> schema. Adapt as needed.

This variable returns the number of @Keywords|People categories (aka keywords) the selected file is assigned to.
A file with the keywords

People|Mary
People|Paul
green
Family
Nikon

thus returns 2.

You can use that in a Metadata Template as the source and write the result into an XMP tag.
Or, use a data-driven category based on this variable to build a hierarchy based on the number of persons per file.

Neat. Must remember that  :D

blackhead2

Wow!!!!

Great. This is exactly what I want! I already checked the count feature but I didn't know that there as an additional filter feature.

Thank you!

Mario

You can combine any number of functions in a variable. They are evaluated from left to right.

Where there is IMatch, there is a way!

Tell your Friends!

mastodon

How can be done this with face tags? Does anybody have this? Thank you

Mario

If you use the IMatch defaults, "face tags" will become keywords under a common parent. The same solution works.

blackhead2

Hi mastodon,

have you already checked whether it is working with face tags? How is the performance?

The variable {File.Categories|filter:^@Keywords\|Persons;count:true} is doing what I want but it is pretty slow. I takes around 2 minutes to evaluate this for 70 pictures (tried it with a data-driven category and a category filter to reduce the number of processed pictures). Also when I use varToy I can see a 1s to 2s delay for evaluating the variable for one single picture while other variables are shown without any delay.

For me it seems that {File.Categories} already has a significant delay and the regex filter increases the delay while the count:true has no additional delay. I'm just wondering why the evaluation takes soooo much time as I only have a few categories.

But if there is no possibility to increase the speed I will use a metadata template to write the number to all my files. Even if I have to wait a few hours it is of course faster than doing it manually  ;)

Mario

I see no delay at all.
Database with ~ 100,000 files, 120 child-categories under 'Persons'.

blackhead2

Ok, for me it is not useable as category at the moment. I created a new category today with a category using the same variable ({File.Categories|filter:^@Keywords\|Menschen;count:true} ) and a "Category Filter" so that it has only 13 pictures.

I created the category an then refreshed the category. According to the log file that takes 23057ms. I see a lot CIMCatalog::RegExGroup entries that take some time but don't know what's the root cause for the bad performance.

See the log file for details.
(Starting at 11.27 16:21:59+    0 [10AC] 10  I>    RefreshGroup-Begin: 'CountPersons'.)

Do you need any other information other then the log?

Mario

You are using this for a data-driven category?

This means that IMatch has to calculate the variable once for each file in your database to get the base values.
I don't know from the tip of my head when the category filter can be applied in the context of variables.
This is of course can be slow. Complex varable times number of files in database => millions of operations to perform.

But your database is so small (less than 4K files) this should just fly.
But it reports that parsing the variable takes between 12 and 20 seconds on your system.

I've tested this with my database and it takes almost 20 seconds. That's because determining all categories for each file via an regexp, then counting them takes a lot of processing power. This is probably the most complex variable one can use.

mastodon

blackhead2, could you please show your "Edit data-driven Category" window, to see all your settings?

blackhead2

@mastodon: I've figured out that the bad performance is not depending on the data-driven category itself but on the number of other data-driven categories in my database. So deleting most other categories increased the performance dramatically.

So my assumption is that the File.Categories variable does not just check (like a look-up-table) in the database to which categories a picture belongs to but also has to do some calculation to check whether it belongs to a data-drive or formula based category.
@Mario: Is that correct? Otherwise I cannot understand the extreme performance increase after deleting some data-driven categories.

To come back to my issue:

As mentioned here (https://www.photools.com/community/index.php?topic=7403.msg51884#new) I copied the @All|@Keywords|People category to a new one (@All|People) so that I can use {File.Categories.Direct|filter:@All|People;count:true} and exclude all data-driven and formula based categories which takes around 2 seconds for around 10.000 files. (Remember, with {File.Categories|filter:^@Keywords\|People;count:true} and all my categories it took 2s per file!)

So now I tried three different approaches:

1. {File.Categories|filter:^@Keywords\|People;count:true}
--> Performance depends (as far as I can see) on number of categories (especially data-driven and formula based) and therefore becomes slower and slower with every additional category added to the db.

2. {File.Categories.Direct|filter:@All|People;count:true}
--> Performance much better than "approach 1" but everytime a person is tagged the category has to be updated manually by creating a new copy.

3. {File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0|count:true}
--> This feels as fast as "approach 2" and does not depend on any category but unfortunately the filter is not working on it :(

So from my point of view I would prefer "approach 3" with an additional filter.

@Mario: Is there a technical reason why the filter is not available for metadata variables? Otherwise I'm going to add a feature request for it.


Mario

Quote from: blackhead2 on November 27, 2017, 09:33:21 PM
So my assumption is that the File.Categories variable does not just check (like a look-up-table) in the database to which categories a picture belongs to but also has to do some calculation to check whether it belongs to a data-drive or formula based category.
@Mario: Is that correct? Otherwise I cannot understand the extreme performance increase after deleting some data-driven categories.

You are using the variable {File.Categories}. This variable determines all categories a file belongs to.
The performance of this naturally depends on the number of categories to check. And also if the categories need to be re-calculated first (formula-based, data-driven).
Now you filter the resulting categories via a regular expression.
Then the variable counts the number of remaining categories.
And this has to be done for each file in your database, or, in your case, for each file not filtered out by your category expression.

This is all very expensive in terms of computing resources required. Using a variable for a data-driven category should be the exception (I explain that in detail in the help). And using such a highly-complex variable in this context is basically a no-go.

See the Variables help topic for details about the filter function and other variable functions available.
The filter function has been added to categories to enable users to create variables which show only a section of the category hierarchy a file belongs to. To be used in the Design & Print module, or maybe in the file window to list only selected categories below each thumbnail. It was never designed to be used  to drive a data-driven category. Much to complex and to slow. Filter is not available for regular string variables like the ones you mention. You can use the is or contains functions to filter regular metadata tags.

Note: Data-driven categories are designed to be used with metadata. That's efficient and fast. The option to use a variable is for exceptional cases and should be used sparingly.

blackhead2

So slowly I'm getting confused... :o ;)

First you suggest to use the variable as data driven category:

Quote from: Mario on November 25, 2017, 02:24:01 PM
You can use the {File.Categories} variable with a filter and count for that:

{File.Categories|filter:^@Keywords\|Persons;count:true}

.....

Or, use a data-driven category based on this variable to build a hierarchy based on the number of persons per file.

Neat. Must remember that  :D

And after I'm wondering about the performance you write:

Quote from: Mario on November 28, 2017, 07:59:54 AM
This is all very expensive in terms of computing resources required. Using a variable for a data-driven category should be the exception (I explain that in detail in the help). And using such a highly-complex variable in this context is basically a no-go.

So I understand why complex variables in data-driven categories should be a exception. I will find a solution that works for me but it looks like there is no "perfect" solution at the moment.

You also write:

Quote from: Mario on November 28, 2017, 07:59:54 AM
Filter is not available for regular string variables like the ones you mention. You can use the is or contains functions to filter regular metadata tags.

But I don't how this should help me for my use case as File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0 is a repeatable tag and I can either count all tags or none. So if the content is:

People|Max
People|Moritz
Animals|Dog

There is no chance to get "2" as result without an additional filter. At least I see no possibility....

Mario

You asked for a possible solution for your very peculiar problem. I tried to find one.
Sometimes there is no solution and sometimes the solution is not fast. Not even IMatch can do everything every user may want.

sinus

blackhead2, do you speak from a lot of pics?
If you speak from only some hundreds images, you could think about adding a keyword for each pic.

I ask, because I do since years, when I tag persons, simply add a keyword with
p-alone
p-two
p-three
p-four
p-five
p-groups

So I have no problems find images with Max and Julia only this two persons.
And I have lerned in the past, add clever keywords, since then I do this.

But, if you speak from several tousends of pics, ok, then you must do, what IMatch offers.
Best wishes from Switzerland! :-)
Markus

mastodon

I am planning to this, but there are about 20.000 pictures, and it will be nice to have an autamated method. I think counting face tags is the only good solution (if yiu use them), but as Mario said that will be a harder one.

Mario

The general intention was to do it once automatically via a dd category (even if this takes longer). Then convert the dd categories into regular categories, similar to what sinus suggest above. This will spare you the time to manually do the initial batch of 20K files. From then on either run the DD cat when needed or manually maintain your "number of person cats" from then on.

blackhead2

Quote from: sinus on November 29, 2017, 08:14:05 AM
But, if you speak from several tousends of pics, ok, then you must do, what IMatch offers.

Unfortunately I'm talking about 10000 of files.

Quote from: Mario on November 29, 2017, 10:05:58 AM
From then on either run the DD cat when needed or manually maintain your "number of person cats" from then on.

Yes of course, both is possible. I am already quite happy what IMatch can do. Running the DD with a new database is quite fast for me but as it relies on the file.categories variable it slows down with every additional category I add to the db. This could be bypassed by working directly with the metadata. So the performance depends only on the number of files, not categories.

I added a feature request. Maybe this is useful for someone else. (https://www.photools.com/community/index.php?topic=7454.0)