Back to TotalFit Methods

fitting_deamon

fitting_deamon(deamon_courtesy_timeout) is a static method that is started up as a separate compiled instance and communicates with the master instance via files in its run folder. It responds to a number of simple commands including: read new data and start calculations, stop calculations, report whether it is alive, exit altogether. When calculations are not run, the deamon simply checks the content of the run folder every check_interval_sec seconds. The deamon may be left up for long time and process any kind of datasets submitted to it (it takes about 100Mb RAM and 0.01% CPU in the idle state). However, if the code of the TotalFit or Dataset-related classes has been changed - the deamon must be recompiled! The monitoring module issues an error if the deamon is incorrect version.

The deamon and the monitoring module try to avoid collisions when accessing the data files. They use 'lock' files that indicate that deamon is in the process of writing data or monitor is reading. If the file exists then another party waits and retries shortly after. If the lock left for longer than dead_lock_timeout interval - the deamon quits, while the monitor simply skips this run folder and reports the dead lock (because other CPUs will be able to finish the job).

The deamon is started with a command line parameter, deamon_courtesy_timeout, which is a time (in hours) to timeout after the last calculation started, or being idle for or last status request was made since. This is a courtesy measure for protection from program crashes and run-away deamons. Technically, the deamon will quit if the calculation runs longer than the specified time out period. If your fitting runs are very long so you need deamons to live longer: change deamon_courtesy_timeout!

The timeout timer starts on the deamon startup and is reset every time the new calculation starts or you make a deamon status request. The convenient time is 36 hours that will keep deamons up if you use them at least once a day.

Deamons derive a unique ID for themselves and incorporate it into their answers so you can catch situation with 'run-away' deamons, which keep running when main program crashes. Such situation will result in degradation of performance because the core will be split between two processes writing into the same file so the speed will drop 2x. Monitoring module detects this situation by checking how many answers it gets after a status request (each deamon will respond with unique file name with numeric ID). To prevent these run-away deamons from being left over the stop and exit signals are not removed until the new run starts, allowing for any other deamons to see them and stop/quit. Whatever measures I put in to control deamons there may be non-standard situations that are not foreseen. You need to check run-folder contents for appearance of answer files with two different IDs and check memory of your computers periodically with 'ps -a | grep totalfit_deamon'.

Most simple clean-up procedure is to run order_deamons_quit() method after you are done, then (after a minute) check RAM of you computers and kill 'totalfit_deamon' processes that failed to quit.

ping_deamons() method allows to check status of the deamons.

NOTE ON 'DEAMONS THAT DO NOT RESPOND' message. You may need to wait a little after stop or exit signals were sent to deamons because the deamon only checks for signals AFTER the fitting run with requested number of trials is finished. Monitoring function does not know how much time it takes to fit this number of trials and simply waits some 10sec. Therefore after your fitting run is done you may still see a warning about deamons that did not respond to the signal. In this situation you simply wait more and issue ping_deamons() again.

 

For a specific usage see TotalFit.m

For compiling and installation of deamons see creating_fitting_deamons