Can't get rid of Database Diagnostic warning

Started by joel23, December 10, 2014, 08:59:08 PM

Previous topic - Next topic

joel23

Hi Mario,
I continously get the following warning when I run DB Diagnostic:

Checking Metabase:
      Warning: 49 superfluous entries in rel_stack for non-masters.
Completed.

I rolled back to an older version of this DB and got "53 superfluous entries in rel_stack for non-masters" and after deleting some files which are offline for this DB, I got "54 superfluous entries in rel_stack for non-masters"
How to get rid of this warning or the problem, if there is one?
regards,
Joerg


joel23

regards,
Joerg

joel23

The warning seems not to matter ... but sommething is pretty wrong.

I added three files and ingesting them wasn't finished after 20+ minutes. I stopped Imatch and started it again (see 1st attachment)
After that the files were ingested but now, changing a folder, takes a similar amount of time. (see 2nd attachment)
Note the huge time gaps in the first

12.11 20:02:37+    0 [1568] 10  M>                 < 16 [8252ms #sl] PTMetabase::ImportFiles
12.11 20:16:06+808475 [1568] 10  I>                ##WAIT: InnerBeginTrans Thread 5480 (0x1568) Engine.UPDQDelegate.Meta waited in BeginTrans for 808444ms. Trans held by 1172 (0x494) Main Thread

and second attachment:
12.11 20:41:05+    0 [0894] 10  M>                   < 18 PTFileWnd::Clear
12.11 20:54:22+797165 [0894] 50  I>                      0 files, 0 in scope


[attachment deleted by admin]
regards,
Joerg

Mario

Your database has only about 25,000 files, which is pretty small. Nothing should take that long. Not even in a database 20 times as large as yours. But loading the database takes a whopping 40 seconds, which is way too slow already. And almost 60 seconds to bring up the user interface (load the database and loading all the views).

This is way too slow. Was your system very busy at the time doing other things?
Your database is on a Z: drive. What is this? External disk? Remote NAS? This would explain the slow processing.

Processing only 3 JPEG files from the M: drive takes almost 10 seconds, just for ExifTool. Also much to slow.

Several of the background tasks in IMatch were blocking for unusual long times because other IMatch tasks blocked the database for a long time. For example, one time the user interface (!) blocked the database for 80 seconds, something that should never happen. But I don't see any unusual operations in the UI, what did you do? I see that IMatch is scanning one or more folders. And I also see some error messages while processing relations. Please run a database diagnosis!

I also see unusual long wait times (over 60s) in background metadata write-back to drive M: What is that for a drive type?

joel23

It seems that IMatch in meantime (the gaps in the log file) is rereading parts of the DB. See the extraction of process monitors log.
The offset is never smaller than 1.800.000.000 bytes. I wonder what IMatch is doing there between 1.7Gb and the end of the DB at 2.43Gb.


30 mins later: okay, I believe I got it. This seems to happen after a DB was moved or copied to another location without changing its ID. When IMatch is done with 'rereading' the DB, it does not happen anymore.
Intentional changing the Database ID takes the same amount of time and the background tasks are the same like above.



Can you please reopen this thread The problem is not fixed ( I used different parameters the other day) but the thread is set to read only. Now the way how metadata (keywords) is propagated is somehow faulty.

[attachment deleted by admin]
regards,
Joerg

Mario

#6
Copying a database to another folder or device without changing the database ID does not cause any further action, folder rescans or whatnot.
The databases will share settings (bad) and cache files (also bad) but that's all explained in the help.

The process monitor does not tell me anything. IMatch is reading and updating sections of the database all the time, even when you just look at a file, or when background processes update categories, collections and other objects. Frequent read and write access to the database file is perfectly normal.

Updating a data-driven category or even apparently simple things as updating the filter panel may require IMatch to read hundreds of thousands of records, transferring several megabytes from and to the database. The database system is also shuffling data around, in order to combine adjacent spaces etc.

You did not answer my questions about the nature of M: and Z:

Why do you link to a thread about versioning from this? The other thread was closed months ago (5.1.14). If you experience a new problem, please be so kind as to open a new bug report.

joel23

Quote from: Mario on December 12, 2014, 02:42:46 PM
Copying a database to another folder or device without changing the database ID does not cause any further action, folder rescans or whatnot.
Well, here it does. And this is to reproduce. I did it eight times already and it always last about 13 mins.
I don't provide any further logs 'cause they all will be like the one I posted before.

It has nothing to do with the DB ID nor with the metabase warning as I first thought. I was able to get rid of the metabase warning today, by running diagnostic twice with V5.1.18
And it has nothing to do with the actions I mentioned before, like adding images or adding metadata.

Any action like changing a folder or clicking another image enforces this behavior. In fact ProcMon tells it immediately constantly does something with the DB when a moved or copied DB was loaded, but the GUI gets inaccessible when I interact with it (like changing a folder and or switch from one to another file)

QuoteThe process monitor does not tell me anything.
Too bad 'cause its a powerful and very useful tool. Sysinternals. It monitors any activity of a process in real time, on the file system, the registry, network, etc. pp.
QuoteIMatch is reading and updating sections of the database all the time, even when you just look at a file, or when background processes update categories, collections and other objects. Frequent read and write access to the database file is perfectly normal.
I know that, but this is different. For the records and FYI what IMatch does and ProcMon shows plz check the first attachment: this is what IMatch reads/writes when changing a folder, clicking around or looking at files: it scans the folders, read the files, etc. pp.

Attachment two shows what is done when metadata is written to a CR2. Aside other files, the log, *.imd5-wal, the image and of course the DB are touched. Fine so far.

And see the 3rd attachment: there are some other entries but almost only constantly parts of the DB are read for 13 minutes and those parts are all behind 1.7GB (the last third of the DB which has 2.43GB)  This is also where the gap in IMatchs log file appears respectively nothing is written to the log.
And no, no other actions were done, simply because I can't do anything in IMatch while this is happening.

I did some more tests with V5.1.18 - it's the same here:
12.13 12:58:14+ 4524 [1CB4] 10  M>                 > 16 PTMetabase::GetRating  'PTMetabase.cpp(5776)'
12.13 13:11:58+823920 [1CB4] 10  M>                 < 16 [823920ms #sl] PTMetabase::GetRating

Again a gap in the log of 13 minutes.  Also look at the values in my former post: it always takes around 13min (between 750.000-850.000ms) Well, the 13mins are related to my system and my DB. But it's pretty constant.

QuoteYou did not answer my questions about the nature of M: and Z:
Yes, because 1st here it IMHO does not matter and 2nd because as an IT-Pro (you might not remember) who works not only with DMS but also with enterprise storage systems like e.g. NetApp MetroClusters and other storages systems, a question like this insults me a bit. Well, not really ;)

It's not a matter of hardware or other software. The disk are SATA3 HDDs, but it happens as well when the DB was copied to a SSD. On which it was located when I made attachment #3.
To prevent other questions: IMatch DBs are excluded from the virus scanner, as the whole M: Drive is.  M: is used for storing images only.

That ingesting of three files (my previous post with the logs) took that long has to do with what IMatch is doing in the background while ingesting those files. So plz forget about the time it took to ingest the files. Usually it takes not that long when I add files.
As said: I copy or move the DB and after it's loaded and I clicked twice, the GUI is blocked for 13mins. When the process once finished I am able to work normally with all copies of this DB (this also rules out hardware or software problems).

Since this happens with one of my DB only so far (my other DBs are only <400MB), I will not open a bug report yet - maybe you already have an idea what and why this happens. 
IMHO the question should be: what kind of process (is it similar to a validation like in diagnostic?) is triggered when a (copied/moved) DB is loaded. Or maybe the question "what kind of process does not write a single line to the log file" will lead to a solution.



QuoteWhy do you link to a thread about versioning from this?
Because you moved it after I reported it as solved two day ago, but it is not solved. And I linked to it from here 'cause the thread is r/o now, otherwise I would have posted there.
QuoteThe other thread was closed months ago (5.1.14). If you experience a new problem, please be so kind as to open a new bug report.
Months ago?  Sorry, I still was able to reply to it two days ago and you moved this thread yesterday morning.

I believe its easier for you or the mods to reopen it, instead forcing me to start a new bug report from scratch and applying all the same information needed for the same problem.

Even when this one here might be a local/DB related problem (others might be affected as well): I invested minimum eight times 13min for making tests about the blocked GUI already.
I didn't do that for me - usually I don't have the need to move/copy the database. 
So I believe I can ask you (in your own interest) doing a simple click and reopen the bug report - of course only, when the forum software allows it.

[attachment deleted by admin]
regards,
Joerg

Mario

I'm aware of the functionality of procmon or the other tools from sysinternals.

When IMatch takes 13 minutes to read a few rows of data from the database (GetRating), something is really wrong. Either with the database, the disk or the system.  This is an atomic database operation. IMatch just asks the database to deliver the data, IMatch does not perform the read and write operations from and to the database itself.

In the log file you have attached are only two calls to GetRating, and both are immediate (less than 10ms). You did not attach the log file from the session where GetRating() takes 13 minutes?

QuoteYes, because 1st here it IMHO does not matter and 2nd because as an IT-Pro (you might not remember) who works not only with DMS but also with enterprise storage systems like e.g. NetApp MetroClusters and other storages systems, a question like this insults me a bit. Well, not really ;)

For this diagnosis it would be really useful to know if the database and files are on a local disk/SSD/USB drive or on a remote Windows server or a NAS running SAMBA...this can truly have an effect.

Did you try to copy the database to a local disk to see if the same effect happens?
Did you try to Compact the database?

The typical reason for a "The database system tries to read a few bytes of data but suddenly stalls for very long" is a virus checker scanning the file and blocking the access to it while it does this, a disk problem, a network problem, a badly fragmented disk / database file, a NAS running SAMBA which cannot deliver the data fast enough or is currently busy otherwise - this is why I asked what kind of device the DB is on or similar reasons.

joel23

Quote from: Mario on December 14, 2014, 09:10:29 AM
I'm aware of the functionality of procmon or the other tools from sysinternals.

When IMatch takes 13 minutes to read a few rows of data from the database (GetRating), something is really wrong. Either with the database, the disk or the system.  This is an atomic database operation. IMatch just asks the database to deliver the data, IMatch does not perform the read and write operations from and to the database itself.

In the log file you have attached are only two calls to GetRating, and both are immediate (less than 10ms). You did not attach the log file from the session where GetRating() takes 13 minutes?
No problem, 'cause I can reproduce it any time. I started 10:38, it's 10:50 now. One more minute.

Quote
QuoteYes, because 1st here it IMHO does not matter and 2nd because as an IT-Pro (you might not remember) who works not only with DMS but also with enterprise storage systems like e.g. NetApp MetroClusters and other storages systems, a question like this insults me a bit. Well, not really ;)

For this diagnosis it would be really useful to know if the database and files are on a local disk/SSD/USB drive or on a remote Windows server or a NAS running SAMBA...this can truly have an effect.

Did you try to copy the database to a local disk to see if the same effect happens?
Mario - as I wrote M: and Z: are local SATA-3 HDDs and C: is a local SDD ;) The attached log is from the firsts start of the DB after it was copied from Z: to C:
QuoteDid you try to Compact the database?
Yes, it is compacted.

QuoteThe typical reason for a "The database system tries to read a few bytes of data but suddenly stalls for very long" is a virus checker scanning the file and blocking the access to it while it does this, a disk problem, a network problem, a badly fragmented disk / database file, a NAS running SAMBA which cannot deliver the data fast enough or is currently busy otherwise - this is why I asked what kind of device the DB is on or similar reasons.
No general problems. Disks fragmentation <10%. 
Once again: it only happens on the first open after the DB was copied/moved.

[attachment deleted by admin]
regards,
Joerg

Mario

QuoteOnce again: it only happens on the first open after the DB was copied/moved.

This is a typical situation for when a virus checker is scanning the file, Windows index services is scanning it or something. If this happens only after you copied the database file, surely there is some external influence at work. Especially if it happens only with one database, or a database exceeding a certain size.

The log file shows very slow performance for basic operations like calculating a collection ('Reject' which takes 23s for only 24K files). Or a sudden 734s to retrieve the rating information for one or more files, while the same operation only 3 seconds before took no time at all (maybe for other files). I don't see why the DB should take 700 second to retrieve the rating for one file, and less than 0.01 seconds to retrieve the rating for other files, just three seconds before. The database operation is always the same.


joel23

Quote from: Mario on December 14, 2014, 12:19:18 PM
QuoteOnce again: it only happens on the first open after the DB was copied/moved.

This is a typical situation for when a virus checker is scanning the file, Windows index services is scanning it or something. If this happens only after you copied the database file, surely there is some external influence at work. Especially if it happens only with one database, or a database exceeding a certain size.
I was away for a while so I couldn't do any tests.
The virus checker is configured to exclude images, the Databases, ExifTool and IMatch process. In procmon I see no other process working with/on the database.

Quote
The log file shows very slow performance for basic operations like calculating a collection ('Reject' which takes 23s for only 24K files). Or a sudden 734s to retrieve the rating information for one or more files, while the same operation only 3 seconds before took no time at all (maybe for other files). I don't see why the DB should take 700 second to retrieve the rating for one file, and less than 0.01 seconds to retrieve the rating for other files, just three seconds before. The database operation is always the same.
Okay, I don't know what all that means, I attached a log from a "normal" session where I opened the same database when it was not copied/moved and doing the same actions:  I just changed the folders or clicked different images.


BTW: this latencies/gaps are not only to encounter when a database was copied/moved, but also when a PSD was updated and a new derived JPG was ingested.

12.26 20:02:04+    0 [01DC] 10  I>                    PTMetabase::ImportFiles ET-Extract for 1 files in 1045 ms
12.26 20:02:04+    0 [01DC] 02  I>                    [59250] 393 tags 'M:\1_SmugMug\_in\_MG_14328.jpg'
12.26 20:02:04+    0 [01DC] 10  M>                < 15 [1045ms] PTMetabase::ImportFiles
12.26 20:02:06+ 1732 [1D50] 50  M>                > 15 CViewWndClientFolders::ReloadFileWnd  'ViewWndClientFolders.cpp(624)'
12.26 20:02:06+    0 [1D50] 10  M>                 > 16 CIMEngineUpdateQueue::Enable  'IMEngineUpdateQueue.cpp(1994)'
12.26 20:02:06+    0 [1D50] 10  M>                 < 16 CIMEngineUpdateQueue::Enable
12.26 20:02:06+    0 [1D50] 10  M>                 > 16 PTFileSystemMonitor::Pause  'PTFileSystemMonitor.cpp(784)'
12.26 20:02:06+    0 [1D50] 10  M>                 < 16 PTFileSystemMonitor::Pause
12.26 20:02:06+    0 [1D50] 10  M>                 > 16 PTFileWnd::LoadGroup  'PTFileWnd.cpp(5340)'
12.26 20:02:06+   93 [1D50] 10  M>                  > 17 PTFileWnd::Clear  'PTFileWnd.cpp(1869)'
12.26 20:02:06+   16 [1D50] 10  M>                  < 17 [16ms] PTFileWnd::Clear
12.26 20:13:35+688698 [01DC] 10  I>                 ##WAIT: InnerBeginTrans Thread 476 (0x1DC) Engine.UPDQDelegate.Meta waited in BeginTrans for 690539ms. Trans held by 11292 (0x2C1C) CIMFileHistLazyWriterDelegate

12.26 20:13:35+  483 [01DC] 10  M>                  > 17 PTMetabase::ImportFiles  'PTMetabase.cpp(2382)'
[...]
12.26 20:13:39+    0 [01DC] 50  M>                    > 19 CIMCatalog::UpdateFile  'catalog\imcatalog.cpp(1440)'
12.26 20:13:39+   46 [30E4] 10  I>                    EUQH: Calculate CRC for file M:\1_SmugMug\_in\_MG_14328.jpg in 78ms


[attachment deleted by admin]
regards,
Joerg

Mario

This looks like you did so many changes before that the background task which updates the file history data in your database took over 60 seconds. Either that, or the database was blocked by some 'external' software or service. The history process is a low-prio process which kicks in when the database is idle for several seconds and then writes small amounts of data in small batches. Nothing that should take more than 2 or 3 seconds, or 60 as in this case...

Please elaborate what you did in that session. The attached log file does not contain any warnings or errors, it seems to be from another session?

joel23

Quote from: Mario on December 27, 2014, 07:47:02 AM

Please elaborate what you did in that session. The attached log file does not contain any warnings or errors, it seems to be from another session?
The attached log is from a session where IMatch behaves normal after the DB was opened. I attached it for comparing it with the attached log where IMatch hangs or freezes and took 13 minutes until I can work on it.

The in-post extract of the log is from a session where I change a PSD with PS, saved it and derived and saved a JPG. Within the nine minutes it took for reading metadata and ingesting the JPG, IMatch showed normal activity like "reading/writing metadata" but was not usable anymore.

But I solved my problems with the long hangers/freeze.
I noticed that everything runs normal when IMatch was started by its Desktop icon.

So I checked and encountered that IMatch and the ExifTool process - which is started shortly after IMatch is launched -, both were running at low I/O and CPU priority.
This happened when IMatch was launched via an icon in the filemanager (DoPus), which inherits its low priority to them. But IMatch's priorities were at normal when launched via the Desktop icon.
Woah. Strange that DoPus itself runs at low priority, which I was able to change
regards,
Joerg

Mario

Now that's a strange thing. Never heard about DoPlus before. The CPU priority, and especially the internal distribution of CPU priorities in IMatch are very important for a seamless cooperation between the different background tasks. IMatch uses background tasks which run with the same prio as the IMatch process, and others which run on a more or less lower priority. If you other application the IMatch process (or any other process) to run on a low-prio, this might have unwanted consequences, especially if the application misses certain system events or takes minutes to respond to them instead of seconds...

Good that you found this yourself. I would have never come up with the question if you use anything else than desktop / taskbar shortcuts or the START menu to run IMatch. Or if you somehow force IMatch to run on a low priority.

joel23

Quote from: Mario on December 28, 2014, 06:45:41 PM
...run on a low-prio, this might have unwanted consequences...
definitive ;) This was a nasty one and not to notice with other applications, which also run at low-prio when launched via DOpus.


DOpus is short for Directory Opus, an otherwise pretty neat and powerful file manager, I almost use it seven years now. Its initial release was 1990 for Amiga ;)
It has preview-handlers for almost all file formats (so no need to open txt, pdf, doc, xls, etc. with their associated applications, for just sneak a peek) inbuilt FTP and also was able to generate thumbnails/previews for most RAW formats and PSD already in 2006 under XP.

It should be said that something must have changed DOpus' priority some months ago, 'cause normally it of course does not run at low priority.
Since "Desktop" still is 'treated' as a system folder by DOpus, the negative effect did not appear here.
regards,
Joerg