Count of File.Persons.OID and File.Faces.OID not returning correct values

Started by Tveloso, June 16, 2023, 04:33:56 AM

Previous topic - Next topic

Tveloso

I have a Data Driven Category called People Counts, which is configured like this:

    Screenshot_2023-06-15_212452.png

It previously contained many child nodes, with counts all the way into the 30s (and not many skipped count values), but it now contains only two nodes:

    Screenshot 2023-06-15 212633.png

When I select a file from the "01" mode that shows three people, these variables in VarToy:

People/Face Variables
=====================
File.Persons.OID . . . . . . . . . . . . . : {File.Persons.OID}
Count of File.Persons.OID  . . . . . . . . : {File.Persons.OID|count:true}
File.Faces.OID . . . . . . . . . . . . . . : {File.Faces.OID}
Count of File.Faces.OID  . . . . . . . . . : {File.Faces.OID|count:true}

...return this:

People/Face Variables
=====================
File.Persons.OID . . . . . . . . . . . . . : 54,30,28
Count of File.Persons.OID  . . . . . . . . : 1
File.Faces.OID . . . . . . . . . . . . . . : 35871,35870,35869
Count of File.Faces.OID  . . . . . . . . . : 1

Could this possibly be related to this topic?:

https://www.photools.com/community/index.php/topic,13218.0.html

There, the concern is with the File.Categories variable, and while my issue does not (directly) involve Categories, the behavior seems to be the same (only 0 and 1 are being returned). 

In the post from Axel in that topic (Post #3), the VarToy ScreenShot shows that the category names are separated with a semicolon, but as shown above, my OIDs are separated with a comma.  I thought this could possibly be a list separator issue as has come up before, and sure enough, when I changed the List Separator in Windows from a comma to a semi-colon (closing IMatch first, then re-opening), my count values were now correct:

People/Face Variables
=====================
File.Persons.OID . . . . . . . . . . . . . : 54;30;28
Count of File.Persons.OID  . . . . . . . . : 3
File.Faces.OID . . . . . . . . . . . . . . : 35871;35870;35869
Count of File.Faces.OID  . . . . . . . . . : 3

...(presumably because both variables were now emitting a semi-colon-separated list).

I'm not sure of all the ramifications of keeping that list separator config change, so I put it back (to a comma) for now...

It's been a while since I've used that "People Counts" Category (not sure if I used it in IMatch 2021.18.4, or only in the prior release), so I can't be certain that it stopped working only with IMatch 2023, but I think this may be the case (that it worked ok in all releases of IMatch 2021, and is now not working in IMatch 2023).
--Tony

Mario

Variables shout output results from repeatable values using the Windows list separator (; or , usually).
It seems that count expects a semicolon. I'll check that.

Does it work when you add a replace:,==~; before the count?

Tveloso

Using replace:

    {File.Persons.OID|repace:,==~;;count:true}

...still returns 1 for my example file with 3 persons.
--Tony

sinus

you forgot here simply the l in replace ... in reality you did it of course correctly, I think.
Best wishes from Switzerland! :-)
Markus

Mario

I did look at this, but I see no error.
When a variable has multiple values for a file (like OID in this case), IMatch separates them with what it internal manages as the default list separator. This list separator is retrieved from Windows when IMatch starts.
This is the list separator shown in Windows 11's convoluted and very hidden settings dialog:

Image1.jpg

START > Settings > Time & Language > Language & Region > Administrative language settings > Formats tab > Additional settings.


The only problem I see that could cause this on your side is when the 1000 group separator and the list separator are identical. For example, both are commas. Can you check that please?

IMatch has a special  branch that checks if the list separator and the 1000 group separator are identical or if the variable value has no list separators. In both cases, it returns 1.
If the variable value has no list separators, it is obviously no list and 1 is correct.

If the 1000 group separator and the list separator are identical, IMatch has no way to tell what the variable contains when it is asked for count or sum or other aggregate functions.
A variable value of 100,200 could be one hundred thousand and two hundred or it could be a list consisting of 100 and 200.

The content of the variable which is submitted to count can be produced by many means, including intermediary results from metadata or other calculations.

Tveloso

Quote from: sinus on June 16, 2023, 01:38:25 PMyou forgot here simply the l in replace ... in reality you did it of course correctly, I think.
Oh brother!  What a knuckle-head I am.  Thank you for spotting that Markus.

But unfortunately, even with the correct spelling, the replace function does not correct the issue.

Quote from: Mario on June 16, 2023, 03:15:34 PMThe only problem I see that could cause this on your side is when the 1000 group separator and the list separator are identical. For example, both are commas. Can you check that please?
Yes, both are identical (set to a comma):

    Screenshot_2023-06-16_150119.png

This appears to be the default for the configured Region Format (set English (United States)), because when I click the Reset button (after having changed the List Separator to a semicolon, and verifying that IMatch now returns the correct counts), the List Separator is reverted to a comma.

--Tony

Tveloso

Mario, I tried a few things to see if I could get the count function to return the correct values again when windows has the comma as the List Separator...doing things like using pereplace instead of / in addition to, the replace function, or a second nested reference to File.Persons.OID, with a hasvalue on the outer reference...but in all cases, the incorrect values were returned for the counts.

So we have the following:

List Separator set to semicolon in Windows
The count function works great...the Persons.OID variable returns the list, correctly separated with semicolons:

File.Persons.OID . . . . . . . . . . . . . . . : {File.Persons.OID}
File.Persons.OID . . . . . . . . . . . . . . . : 54;30;28

...and the count function for that variable returns the correct value:

Count of Persons.OID . . . . . . . . . . . . . : {File.Persons.OID|count:true}
Count of Persons.OID . . . . . . . . . . . . . : 3

List Separator set to comma in Windows
The count function does not work...the Persons.OID variable returns the list, correctly separated with commas:

File.Persons.OID . . . . . . . . . . . . . . . : {File.Persons.OID}
File.Persons.OID . . . . . . . . . . . . . . . : 54,30,28

...but the count function returns an incorrect value:

Count of Persons.OID . . . . . . . . . . . . . : {File.Persons.OID|count:true}
Count of Persons.OID . . . . . . . . . . . . . : 1


Perhaps ironically, when the Windows List Separator is a semicolon, and we replace it with a comma:

Persons.OID (with semicolons "converted")  . . : {File.Persons.OID|replace:~;==,}
Persons.OID (with semicolons "converted")  . . : 54,30,28

...this acts to influence the count function - it now "correctly returns the incorrect value" (because count no longer sees the variable's value as a list - since the List Separator in Windows is a semicolon, but the "list" is now separated with commas):

Count of Persons.OID ("converted" semicolons ) : {File.Persons.OID|replace:~;==,;count:true}
Count of Persons.OID ("converted" semicolons ) : 1

But when the Windows List Separator is a comma, and we replace it with a semicolon:
Persons.OID (with commas "converted")  . . . . : {File.Persons.OID|replace:,==~;}
Persons.OID (with commas "converted")  . . . . : 54;30;28

...this does not influence the count function (it returns the incorrect value whether or not the separator is replaced):

Count of Persons.OID ("converted" commas ) . . : {File.Persons.OID|replace:,==~;;count:true}
Count of Persons.OID ("converted" commas ) . . : 1

In thinking about it more, I believe that I did use my "People Counts" Data Driven Category under IMatch 2021.18.4,  and it worked there...so I'm pretty sure that this new "List Separator behavior" started in IMatch 2023 (perhaps Persons.OID returned a semicolon-separated list in IMatch 2021 regardless of the Windows List Separator config?)

I understand the problem that having the same value configured for the 1000 Group Separator and the List Separator causes.  Maybe IMatch should not use the Windows List Separator at all?...(and use its own List Separator exclusively - in Edit->Preferences->Metadata)
--Tony

Mario


QuoteMario, I tried a few things to see if I could get the count function to return the correct values again when windows has the comma as the List Separator...doing things like using pereplace instead of / in addition to, the replace function, or a second nested reference to File.Persons.OID, with a hasvalue on the outer reference...but in all cases, the incorrect values were returned for the counts.
This will not do anything, I'm afraid.
IMatch has a check to see if the list separator and 1000 group separator are identical and then disables count and other functions which must deal with lists and numerical values - this was added when users tried to count repeatable numerical values > 1000 and they were split wrongly by count and sum and avg. No way to tell what 112,456,778 really is.

This change was added to IMatch on November 26. 2021 and included in 2021.14.2 release.

QuoteMaybe IMatch should not use the Windows List Separator at all?...(and use its own List Separator exclusively
The purpose of the system-wide list separator is to tell applications how the user wants to separate items in a list. This depends on the country and IMatch adapts to it automatically. Like it adapts to number formats, the decimal separator, 1000 groupings and whatnot.

It does not make much sense for me to use the same character for grouping numbers and for separating lists. Similar to use the same character for the decimal point and as list separator.

I will give this a think and maybe I can come up with something. Or you switch to use the ; as the list separator and keep the , to group numbers into 1000.



Tveloso

Thank you Mario.

I went ahead and set the List Separator in Windows to a semicolon.

I'm still thinking that it might be good for IMatch to use its own List Separator exclusively, and not use the one configured in Windows.  In fact, I'm wondering if, based upon the label for that control in Preferences:

    Screenshot_2023-06-17_092658.png

...if that might have been your intention initially.
--Tony

Mario

Quotering if, based upon the label for that control in Preferences:

This does not extent to locale-specific list, date, time and numeric formats.

IMatch uses many Windows built-in formatting functions which are based on the locale of the current user. Trying to implement this myself or maintaining an "IMatch locale" would be way too complicated and prone to errors. IMatch has a huge code base and things that format things based on the user's locale are all over the code base.

Much easier would be an "Ignore the fact that the user has configured the same character for 1000 groups and list separators and just use the list separator as set" variable formatting function. That's what I had in mind, at least, after thinking about this for some time.

In your case, this might look like:

{File.Persons.OID| forcelistsep:true; count:true}

and this would disable the logic IMatch by default applies.
I've yet not determined how to implement this, since variables have no state while being evaluated (so I don't know where to store the forcelistsep setting).


Tveloso

Thank you Mario.

As usual, much more going on in IMatch than we realize.

The forcelistsep mechanism looks like a great idea!  For me, this is resolved, since I switched to semicolon as the list separator, but having something to allow the count, sum, etc. functions to work correctly for variables, even in the face of having the same Thousands Grouping and List Separator configured in Windows (which will be true for US users), is important for IMatch to provide I think...(those are "central" to the power of IMatch Variables).
--Tony

Mario

QuoteAs usual, much more going on in IMatch than we realize.
Most people have no idea how complex IMatch is ;)

What about this new variable formatting function?

Image1.jpg

It allows you to override the default processing when the list separator and thousand separator is set to the same character in the numeric settings in Windows. This function should be added first, before count or sum or avg.

I still wonder how Windows or other applications format lists of numbers containing thousand separators when the list separator is the same?

100,222,333,444,555

Is that

100
222
333
444
555

or maybe

100,222
333
444,555

???





Tveloso

Excellent!  Thank you Mario.

This will be an important addition for US users (where the Thousands Grouping and List Separator are both a comma).

Incidentally, I noticed my first "ill effects" of having changed the List Separator to a semicolon.  I'm not a bug Excel user at all,  but I do periodically enter a particular formula into a temporary SpreadSheet (otherwise I use Excel almost exclusively for Report Presentation, and don't do much else with it).  When I went to enter (actually paste) my formula, I received that syntax error dialog ("there's a problem with this formula").

At first I didn't understand what could be wrong with my formula, then I noticed that the tooltip was showing semicolons, where I had commas. 

So I was used to entering =RIGHT(A1,4) for example, but now needed to enter =RIGHT(A1;4).  This was not enough to make me consider undoing the List Separator Change I had made (I thought, "I'll just start using semicolons, for the 1 or 2 times a month that I must enter a formula in Excel")...but I can see where for another user with lots of SpreadSheets containing lots of Formulas, this would require a bit of a conversion effort (since all of their formulas would have been rendered invalid by that config change), so changing that List Separator might not be an option for them.

I can definitely can see where "a list" of edited numbers, containing commas (when the comma is both the Thousands Grouping and List Separator) is completely ambiguous.  In the case of a CSV, the edited numbers must be quoted (since they contain commas, and so are really strings), but there are standards (even an RFC I think) for CSVs that require that...in other contexts, a list of edited numbers immediately becomes indeterminate.
--Tony

Mario

I pull this suggestion. Unexpected side effects and an inconsistency detected.

Apparently that other Mario (not related, fired long ago ;) )

changed his mind while adding more and more variables over time. Some variables use the metadata separator configured for the IMatch engine (Edit > Preferences > Metadata) and others use the list separator configured for the current Windows user account.

Using both is not inconsistent and thus per my definition evil.

Tveloso hinted at that in his post above:
QuoteMaybe IMatch should not use the Windows List Separator at all?...(and use its own List Separator exclusively - in Edit->Preferences->Metadata)

I first thought about introducing the forcelistsep function as shown above in my prototype.
But then I've dug deeper and found that some lists are produced using the database separator, especially lists like categories or persons which are pre-calculated in batch to keep variables quick and fast.

Using the same character for all lists is desirable. But changing this, how many existing variables out there will it break?

One of the most common examples in the help is separating the keyword list of a file to make them break into individual rows. And these variables always assume that the keywords are separated by ; which is the default database separator character. It is also used for categories, persons and suchlike.

After some pondering, I've decided to make a breaking change and use the database separator character for all variables which return or split multiple values.

Which means for users which have a comma , as the list separator in Windows settings but use the default ; for the database (the "IMatch list separator"), the results of a variable like {File.Persons.OID} will change from

Tom,Peter,Mario to Tom;Peter;Mario

Keywords would have been returned as beach;sun;vacation on these computers anyway. So its much more consistent now.

Lists are emitted using ; and the count, sum, avg functions split variable values using ; in the future.
This avoids dependencies on the numeric formatting the user has set, especially for countries where the thousand group separator is the same as the list separator - usually a comma.

What do you think?

Mario

I will implement this as explained above.
It's now consistent, the issue with the thousand grouper and list separator being identical does no longer apply (unless you use ; ).
I've updated the variables documentation to explain about Variables Returning Multiple Values in a separate section of the Variables topic.

This change should not affect many users, if at all.

Tveloso

Mario, I'm pretty sure that this was working when IMatch 2023.1.10 came out (which included Breaking Change #0190), but perhaps a subsequent Release has "reverted" that change?...

Variables that return lists are once again using the comma (instead of the semicolon) as the list separator, so the count function is not working.  For example, for a file containing two faces, the Persons list shows both faces separated by a comma (and not a semicolon):
File.Persons.OID . . . . . . . . . . . . . : {File.Persons.OID}
File.Persons.OID . . . . . . . . . . . . . : 265,35

...so the count is incorrect:
Count of File.Persons.OID  . . . . . . . . : {File.Persons.OID|count:true}
Count of File.Persons.OID  . . . . . . . . : 1

And the same is true for the Faces list, for that file:
File.Faces.OID . . . . . . . . . . . . . . : {File.Faces.OID}
File.Faces.OID . . . . . . . . . . . . . . : 102041,102089

Count of File.Faces.OID  . . . . . . . . . : {File.Faces.OID|count:true}
Count of File.Faces.OID  . . . . . . . . . : 1

If I change the Windows List Separator to a semicolon, the count function returns the correct value:
File.Faces.OID . . . . . . . . . . . . . . : 102041;102089
Count of File.Faces.OID  . . . . . . . . . : 2
--Tony

Mario

A regression. Sorry.
Works in the next release. Checked with system list separator set to either , ; or #