problem with multiple WriteText calls

bnewman · June 18, 2018, 11:24:18 PM

(I tried to send this earlier but the post just didn't seem to work. Sorry if this post shows up twice)

I am writing my first script in IMatch2017. The entire migration at this point hinges on my being able to convert one script that I use to generate HTML fragments that I need to use on my website (www.bernienewman.com). However, I've hit a snag at this point and need some help.

For now the script is still a work in progress but if it works it contains the basics of what I'll need. But I've hit what may be a timing issue in writing text and need help.

To understand, I'll digress to explain what the database looks like. In IMatch I have a category "Web" with children categories for particular types of photographs that appear on my website (e.g. "Manmade", "Patterns", etc). Photographs are assigned to these children categories.

What the script is trying to do is obtain all the files associated with the parent category and with each of the subcategories. The script will eventually writes an HTML fragment for the parent category, and for each child category containing lines of html code related to the image file, it's title and date. Right now, the script only generates lines based on filename and title, but that is sufficiently representative of what the final script will do - only the content of the text line will change.

I attach two variants of the script (index5 and index6) and the console output. With reference to the script, the problem is that on occasion, there is an error with the call to WriteText at line 160 (index5, line 162 of index6). The files all get written, but some of the lines are missed because of the error. The same error is detected every time, and the handling of the error occurs at line 167 (index5, line 169 of index6) . The console log shows the same error every time (with of course a different line to be written being reported), and looks like this for the run of index6.html:

Quoteerror:

{
"readyState": 4,
"responseText": "{\r\n \"error\":{\r\n \"code\":1402,\r\n \"message\":\"Error writing file. File does not exist or is not writable.\"\r\n }\r\n}",
"responseJSON": {
"error": {
"code": 1402,
"message": "Error writing file. File does not exist or is not writable."
}
},
"status": 500,
"statusText": "Internal Error"
}

data:

cat id:134,filename:P20160731-050BW_SQ.tif,title:Untitled

Based on this error, I can only assume that the file is somehow not writable on occasion. I suspected that maybe the writeText's were happening so fast, that there was an occasional writeText happening before the last one could finish. So before making this post I made a modification to lines 160-168 of index5 by surrounding the writeText with a 50msec timeout, in lines 161-171 of index6. I thought that making a delay between successive writes might help, as much as I dislike such a non-deterministic approach.

However, this didn't fix the problem either. I have attached the IMatch output windows showing the console of what happened with the run of index6. All the occurrences of "" are the cases where the writeText worked.

I really need this script to work in order to complete my migration to IMatch2017. Please advise.

ubacher · June 19, 2018, 08:39:40 AM

Had a look at index5.
Not sure if it is cosher to get the categories the way you do - I would put it in the (document).ready block.

When you call createHtmlFragment(cat); you forgot that this is executes asynchronous. Execution continues immediately.

NB: I am a novice in Javascript and, although I have written quite a few scripts for myself, I still struggle with it.

Mario · June 19, 2018, 08:42:01 AM

Without looking at your script too closely or running it in the debugger I would guess that you are creating race conditions by doing too many things at the same time.

For example, your statement

cat.childrenIds.forEach(function(c) {

iterates over all children of a given cat, gets the child categories and then does stuff, e.g. writing a file.
Note that all this happens in parallel. The loop runs very fast, and IMWS responds very fast. Basically you are processing dozens of categories simultaneously, all operations firing requests, getting data, writing files in other requests.

Put a console.log into writeFragmentLine and output the file name written too. You should see many calls basically at the same time.
Do the same for writeFragmentLine.

In theory, as long as no two files names are the same, it should work. Unless perhaps your app tries to write hundreds of tiles at the same time, exceeding some resources...

Or your app runs writeFragmentEmptyLine and writeFragmentLine at the same time for the same file, thus causing a "file is open already" problem.

Probably it would be best to restructure it to process one category at a time, not all categories at once, simultaneously.

bnewman · June 19, 2018, 06:51:08 PM

Thank you both for the replies and advice. However, it still didn't work.

First, I am aware that the script would cause the parent category and all child categories to start writing files. Generally, this worked and the all started firing. I added console.log as Mario suggested and saw that it's all different files being written with no apparent reason for why a write fails. I also temporarily commented out the call to writeFragmentEmptyLine to see if it had an effect, and it didn't.

So, to see if just doing one category would succeed, i made a test revision of the script (attached) where I only write an HTML fragment for the parent category and not for the children. This still failed in the same way.

When I look at the console in Chrome remotely connected, I can see more information than what shows up the console log of the IMatch output. Unfortunately I cannot save the console output of chrome for some reason, but attached the start and end of the console in Chrome. This console output of console_start shows me that there were 31 errors (the number of errors varies from one run to another). The IMatch database currently has 583 files associated with the parent category, and the resultant fragment file had exactly 31 lines less than expected corresponding to the errors. After the initial empty line write, there were another 54 lines written before the first error was reported. Then 2 lines written, and a failure, and then 45 lines. The Chrome reports a "Failed to load resource" error and is associated with "writetext". I don't know what that means. There is another interesting thing that appears in the Chrome debugger console that doesn't show up in the IMatch console, and that is what appears in the console at the end. The console_end shows me a different kind of error message "POST http://127...etc" and is associated with "jquery.min.js:4". It's just a different error report but I don't know if it's telling me something.

So, i'm still stuck even writing the 583 lines of a single category. That is a lot of rapid fire requests, but this is a computer after all, so I don't think it's such a huge burst. If it was in the millions, I could understand. But it seems odd to be hitting a performance barrier that produces a race condition after merely 583 requests.

I thank you for your previous help, and hope you can offer another suggestion. I'm stumped.

Mario · June 19, 2018, 08:00:10 PM

I suggest you revise your script so that it performs one category after the other. You are firing so many requests in parallel that it is very likely that your app consumes all available file handles or writes to the same file from multiple threads at the same time. The error message is pretty clear. Since the file is created during the write operation, the only explanation is that your app is trying to write to it from several functions at the same time...

Remember that every call you to do IMWS does immediately return with a promise. This means that your loop over 500 categories opens 500 requests at the same time...

Did you add the console.log to the Write functions so you can see the problem (see my initial post).

Maybe you can explain in a few simple words what you are actually trying to accomplish? I don't understand what your script is doing. It looks quite complicated...

bnewman · June 19, 2018, 09:23:33 PM

Thanks for responding so quickly.

I did add the console.log calls in the Write functions to show me what file was being written. It didn't tell me very much except that, as expected, the writes were occurring to mix of different files. There are cases where the same file is being written to over and over again, sometimes 45 times and sometimes more, and then it gets an error. So, it does look like a "not ready" type of error.

In my last post, I did perform the script on only a single category, and still got the error.

The nature of the script is as follows. Given a category with children categories, find all the files associated with each category, and, in a separate file ("fragment") for each category, write a line of text that identifies the file and and an attribute representing the file's title. In my specific case, I have a category "Web" and it contains six children categories.

With reference to index5.txt posted yesterday, the script works as follows. It looks for the category "Web" and finds its children id's (lines 78-82). Then, it writes a file containing the lines for each of the image files associated with category "Web" (line 85). After that, the script iterates over the childrenid's and, for each one, finds its category and associated image files (lines 86-90). For each such category, it writes a file containing lines for each associated image file (line 92).

The actual line writing into a file is the createHtmlFragment (lines 100-133). This function process the files associated with a given category. Because of the asynchronous nature of scripting now, this could be writing a line into a file associated with the parent category or any child category in terms of how the call is actually honored. Before actually writing the line, the script has to first get the title for the image file. In my database, I created an attribute set "Image" that contains "Title" and "Description". So, the title for a given file associated with a category is obtained in lines 117-126. Again, because of the asynchronous nature of scripting now, only when the script actually knows it has the title of a file (at line 127) can it then use that title as part of the line that is written into the fragment by calling writeFragmentLine (line 127).

The writeFragmentLine (156-170) simply forms the name of the fragment file based on category, and the line to be written, and appends the line to the existing file.

In the script that I sent earlier today (image5_1.html.txt), the operation is done only on the parent category "Web". The database has 583 files associated with that category, and the script cannot write those lines without encountering errors.

I hope you can advise what my next step should be. Even just writing the lines of one category at a time doesn't work.

bnewman · June 20, 2018, 01:20:26 AM

As a follow on, I can now declare victory... kind of.

I revised the script so that instead of making a call to writeText for every line, it concatenates the line into a string, e.g. line += "more stuff". I also revised the script to get the filecount and then use that to count down the iterations in the loop. When it hits zero, the script issues a single writeText using the concatenated string. This actually works!

I was concerned about the uncontrolled expansion of the concatenated string. What I provided in the example scripts was just a representative string of text per line. The actual one is much longer. Working this way, the concatenated string for the currently 583 files associated with the parent category resulted in a whopping string of length 221,933 bytes

but this could be accommodated. Do you know what is the maximum string length allowed?

I could refine this so that if the concatenation goes too high, the script could issue a writeText after reaching somewhere near a maximum length. That would still keep the number of writeText calls way down.

By the way, the script is able to exploit the asynchronous nature of the scripting to work in parallel on the parent category and the six children categories and it all worked out fine. Another concern I had was that in counting image files processed and concatenating lines, the script would start to mix the lines from different categories. But somehow there must be a local copy of count and concatenated line for each category because the result was fine.

Thanks for your help. I think I've crossed the major hurdle that was stopping me. If you know about maximum limits of strings, I may make a refinement.

ubacher · June 20, 2018, 08:15:57 AM

Quote.....script would start to mix the lines from different categories

I think this is definitely a possibility. Might happen only occasionally - so you can not rely on it not happening.
Asynch programming is not really well suited for such a job - means you have to go through hoops to achieve
synchronous behaviour.

Mario · June 20, 2018, 09:05:22 AM

QuoteDo you know what is the maximum string length allowed?

JavaScript does not limit the string length. I did never run into limits, and I guess a string can be several GB before JavaScript gets into trouble.

Caching everything in memory and doing one WriteText call is the only sensible solution.

Still, by your description I have the impression that your app works correct just by luck. I have written many scripts now and usually it is easiest to use a standard pattern like:

1. Collect all files/categories you need to process into an array. This is your to-do list => elements[]

2. Write a processNext() function which processes the next element in the array.

3. Setup a nextElementIndex variable with the value 0.

4. Call processNext().
It processes elements[nextElementIndex], doing all the async calls and work etc. (nesting them or using async await).
When finished it checks if there is another element to process (nextElementIndex < elements.length). If so, it calls itself via

window.setTimeOut(() => {
++nextElementIndex;
processNext();
},0);

which avoids recursion (processNext ends and is then immediately called by the timer).

Processing one at a time has also the advantage that you can easily display a progress indicator, allow the user to abort etc.
See, for example the ProcessFiles or Progress Bar sample apps.

Trying to do hundreds of asynchronous calls simultaneously (which you do) is calling for trouble, because if you don't have 500 categories but 1000 IMWS may run out of threads and start to return HTTP status 500 or 503 when too many requests come in too quickly. You can do it to optimize speed, but it requires a lot of extra effort and a clear understanding how asynchronous calls work, how to deal with errors etc.

Retrieving new categories to process while doing WriteText for others is OK. Even if you do this for a few dozen elements. But you need to keep an eye on resource utilization. IMWS never runs out of gas under normal conditions but doing hundreds of synchronous operations can cause problems. IMatch limits the number of parallel connections (ports) to 500 (which is lots!) and when more request come in they are queued until the internal queue is full. Then the integrated web server returns 503 (Service unavailable).

But IMWS/IMatch is so fast that it usually is not worth the effort to write a complex parallel app or using web workers. Keeping the app simple, doing one thing at a time is usually much easier and better to manage and maintain.

bnewman · June 26, 2018, 10:32:40 PM

Thanks for your advice. I have finally got the thing working correctly. Mario, after going with your suggestion to split the effort into a data collection stage followed by the processing of the collected data, the entire script looks much easier to handle. Doing this also showed me that I had a bug in the earlier version which I thought worked. I couldn't use the approach you suggested of the recursive function, partly because, similar to what other people posted, I had a need to "get" more than one kind of item. It was actually first getting a parent category and the files associated with it, and then all the child categories and their files, and finally to get an attribute ("Title") for all of the files. I also dislike the use of recursion for the fear of inadvertently getting into an unbounded recursion and draining all the resources. However, it was not difficult to simply acquire this information in a loop.

Like other posters, this new way of working asynchronously posed challenges for me. In the end, it came down to figuring out how to reasonably identify when the last elements being get'd was reached. Since the script was dealing with a situation where the number of elements could be determined, this was not ultimately too difficult to figure out. The solution also meant having to cascade the calls from various stages of the data collecting functions to the data processing function. I dislike this approach, but could not see any reasonable way around it.

Late in the script revision, I came across posts debating the merits of using asynch/await methods. I actually rewrote the script to try this out. It did make the data collection completely synchronous and the functions actually did look simpler. However, I couldn't find a clean way of having the control flow go from the data collection functions to the data generation one ("CreateHtmlFragment" in the attachment) without having to create a "promise" to denote completion of the data gathering. I'm sure that I'd have to do more study for that, and I already had something that did work and with asynchronous operation under control. My guess is that making the control flow using a new Promise would actually create code that is more complex looking than what I have now, so I abandoned that idea. I kind of agree with Mario's position that it's preferable to accept this as the new way of working rather than by pounding a square peg into a round hole in order to make things work sequentially like the good old days.

I heavily commented the script because while I don't have to write too many scripts, at least I'll have my own personal example to use in the future if I have to make some more scripts. The last thing I added was a progress bar using the IMatch sample app's. In my case, the script uses the instances of IMatch calls as the basis of progress. it was really easy to calculate the total number of such calls and then just count each "then" reached to drive the progress bar. I've attached the current script in the interests of demonstrating what the script finally looks like, and as gratitude to those who helped me out. Maybe someone else could use another example as well.

If you've read this far, then one weird thing of note is lines 274-275. After the script finished writing a file, I wanted it to write the filename to the console. Because of the asynchronous nature of the script, the variable containing the filename that was identified at line 255 was not remembered by the time the script got to the "then" part of the "writeTextFile". Using the Chrome debugger, I could see that each of the 7 writeTextFiles were called, and when the "then's" started happening, the name of the file was the one used by the last preceding "writeTextFile" request. I don't know how javascript is handling these asynchronous operations under the hood. I had suspected that it was possibly interrupt driven where the state of the local variables was kept with each enqueued request. But it now seems, at least in the case of what I had to do with line 274, that you cannot use even local variables in this way. If anyone knows different, let me know.

Anways, any suggestions as to how to improve the script are welcome and thanks for the help. Now, back to photography for me.

Mario · June 26, 2018, 11:55:10 PM

You know that you can get categories recursively by using the "recursive" option?

From the docs: "If this option (recursive) is specified as true, child categories are returned recursively."

Maybe using that can simplify your script.

bnewman · June 27, 2018, 12:32:43 AM

I had not noticed that one. It will definitely simplify a bit. The script was doing what the IMatch function already does.

By the way, in the IMatch Webservices Documentation app, there are clickable examples of how the get endpoint works. I have setup debugging using port 49777. However, in Chrome, when I try for example to browse to "http://127.0.0.1:50519/v1/categories?{auth}&id=3&fields=id,name,parentid,level,children&recursive=true", I get an error message. When I replaced the port number to 49777, the browser reported that the localhost could not be found. Is there some way for me to issue commands from this remote location using the chrome browser?