Data Driven Category Detect Hierarchies buggy?

Started by Darius1968, October 21, 2017, 11:07:06 AM

Previous topic - Next topic

Darius1968

I think this could be a bug in the latest version of IMatch, but for the time being, I'm posting here, in case I'm wrong: 
A specific subset of my database (45,000 files) has a certain value assigned to an attribute, which is to restrict the scope to that portion of my database, when I request an enumeration of my full folder directory structure, via a data driven category. 
All is fine if I don't involve a detect hierarchies (\, as the separator), as I get my 45,000 files enumerated, as expected.  However, if I go ahead and reinstate the detect hierarchies, then I get 283,211 files enumerated! 
With respect to the said data driven category, Level 1 is based on the attribute (via the variable), Level 2 is based on the IMatch tag for folder. 
What should I do now? 

Mario

I have no idea. You are not providing anything that would allow us to understand what you are doing.

How does your data-driven category setup look like?
How does the data in your Attributes look like?
What does the preview show?

Do you know what the detect hierarchies feature is for, how it is to be used?
This can be really complex to understand and I surely have not tried to cover all fringe cases.
Detect hierarchies is supposed to be used to detect intrinsic hierarchies in data and then create additional sub-levels based on that data. Tricky, though.

Darius1968

Would a log file help you troubleshoot what's getting hung up? 

Darius1968

The 1st two attachments are how I have Level 1 & Level 2 of my data driven category set up.  The 3rd attachment shows the enumeration, without detect hierarchies on Level 2.  The 4th attachment shows the enumeration, with detect hierarchies on Level 2, as well as the deviation of the scope of what is displayed, compared to the 3rd attachment. 

Mario

The 3rd atachment looks correct? Your folder names have been split into sub-levels?

Darius1968

Yes, the 3rd attachment indeed, shows that IMatch's output is entirely correct!  But, the 4th attachment shows how IMatch has gone haywire, after I modified Level 2 of my category, so that "Detect Hierarchies" is active, with "\" as the separator. 

Mario

3 looks like the unprocessed folder names.
4 shows hierarchies produced from the folder names splitting them using \

For me this looks correct?

Darius1968

I can demonstrate here, with my 5th file attachment, that I can get the "Detect Hierarchies" to work with the "\" separator.  I did this by first, bookmarking all of the original 45,000 files.  Then I set up a formula category that reflects these bookmarked files.  I finally went to the data driven category, disabled Level 2, then modified Level 1, such that its scope is restricted to that formula category.  I then set the output of Level 1 to be the IMatch Folder Tag, with "Detect Hierarchies" enabled, with "\" as separator, and it works! 

Darius1968

"3 looks like the unprocessed folder names.
4 shows hierarchies produced from the folder names splitting them using \

For me this looks correct?"

It's half correct, because the output is completely off in 4! 

Mario

What do you mean by "off"

A folder name

c:\images\beach\daytona

should produce

c:\
  |- images
    |- beach
      |- daytona


and it does. Just checked with the folders in my test database.
What is the exact problem? Maybe just describe it, I don't see anything wrong in your screen shots.

Darius1968

#10
A problem can clearly be seen on examination of a comparison between my 4th & 5th (last) file attachments: 

First of all, the result of Screenshot 4 is quite different from that of Screenshot 5.  Screenshot 4 has as I said, an output that is "completely off"!  I say this because there are references here, to folders in my database, for which NONE of my files (those that fall under the attribute node, "Darius Feet") even have a residence therein!  Contrast this with Screenshot 5.  Here, all my 45,000 files have one thing in common in that they are all at or beneath this path:  E:\_01\_Backups\_201309 (C Drive)\00\_Family\Darius\_Interests\_Fetish\_Feet.  Screenshot 4 is yielding output that is outside of this path, and hence, outside of the range, for which the attribute having a value "Darius Feet". 

Screenshot 4 has Level 1 set to Variable, with output, [File.AT.Ino.FolderKeywords], as in Screenshot 1.  Level 2 of this screenshot is as is in Screenshot 2, with the exception that Detect Hierarchis IS enabled. 

P.S.
To prove further, that the output is off in the case of Level 1 being an attribute to limit the scope, and level 2 being the folder output, if Level 2's output is unprocessed (no Detect Hierarchies), then I only get 45,284 files in the result set, whereas if Level 2 is processed (Detect Hierarchies, "\" as separator), then I end up with 283,211 files!  This difference is played out in the screenshots. 

Mario

#11
I don't think data-driven categories can do what you want when you use the spit hierarchies that way.

I guess what you want to tell me is

1. You group by Attribute (or whatever) on Level 1.
2. You group by folder name below.

This gives you the folders for each Attribute on level 2. Which is correct.

3. Now you enable the split hierarchies for the folder level, and you suddenly get all folders, instead of just the folders 'linked' to the Attribute as before.
And this is not what you want or expect.

Is this what you want to tell me with all the screen shots and notes?
In that case, this may be a limitation of how data-driven categories are implemented. Maybe I did not bother to implement it, or maybe it was too complex and would have ruined performance for all users and all data-driven categories. I cannot tell and I currently don't have the time to did into the very complex data-driven category code. Not for some time at least.

I think this is a very specific issue, because by definition the levels

c:
c:\images
c:\images\bearch

cause IMatch to think that it cannot remove c:\ from the child level because c:\ for sure contains the images o the level above. And this causes all child nodes of c: to be retained, causing the result you see. I think this is a behavior by design, not truly a bug.


Feel free to add a bug report, link to this thread and I will look into this when I have a time slot. Probably make take until January. though.


Darius1968

"Is this what you want to tell me with all the screen shots and notes?"
Yes!  We are now, at least on the same wavelength about what the problem is. 

Question:  This problem is manifesting if I set up the said 2-level data driven category.  To be sure, if I bookmark those 45,284 files, and then set Level 1 to be those bookmarked files, via the variables option, I will then still, continue to run into trouble with output, should I choose to implement "Detect Hierarchies".  However, if I implement just a 1-Level data driven category, and limit my scope (to those 45,284 files) by specifying category (a formula category, based on bookmarked files) in Level-1's "Category Filter", I then can have correct output, even after the processing of "Detect Hierarchies".  Why is this? 

Mario

This is because of what you create when you split database folder names into levels.
If IMatch splits c:\images\beach

it first gets c:\ and this include all files in the database, And this breaks the logic which rolls the data-driven categories from the top to the bottom. Special case. As I said, I have currently no time to look into this. This is complex and it will cost a day even to understand again how all this works. Hence my request about a bug report and time-line above.

Mario

I looked into this and it was exactly as I thought.
Adding additional levels like this prevents the downstream filtering process implemented by data-driven categories.
This is a very, very complex logic.

Although there were no (as far as I recall) reports about this issue ever, I've implemented a separate filtering phase just for this case. This page removes empty levels (without files or sub-elements) created by detecting hierarchies in input data.

Darius1968

Thanks and appreciation for your good work! 

Mario

Remember that when the next fee-based upgrade is due  ;)