Back to TotalFit Methods

Implementation details

The fitting routine has three separate parts: (1) preparation of fitting session, (2) fitting with a number of Monte-Carlo simulations, (3) monitoring accumulation of results of Monte-Carlo runs and evaluation whether confidence intervals converged. More specifically:

PREPARATION OF MONTE-CARLO SESSION: prepare_fitting_run()

1. perform best-fit

2. compute ideal values

3. save an object, which (multiple) run modules would load and perform simulations on.

 

RUN MONTE-CARLO SIMULATIONS: run_fitting_session()

1. Run a requested number of simulations

2. Save the best-fit results in the specific folder.

3. This module is re-run multiple times until confidence intervals computed in the next module converged (a signal to stop is received).

This module may be run directly by the current instance or by a remote deamon having access to the working folder. (Say, remote deamon is working on a different computer having been started from the "run folder" shared with main computer using NFS). The simplest way is to have a multi-core computer where multiple deamons may be started on different cores (from different terminal sessions with the working folder set to "run folders"). The most complicated way is to have folders on remote computers synchronized via some sort of SSH/SFTP-based utility. Then deamons don't need access to your real working folders because their local run folder on a remote computer will be automatically synchronized with your main run folders. This way of operation is cumbersome but allows infinite scaling-up of your calculations using services such as Amazon Cloud Computing.

 

MONITOR CONVERGENCE: monitor_fitting_session()

1. Read new best-fit data files from run folders, where run modules saved their results.

2. Compile them to the 'best-fit' pool if the algorithm converged (exitflag>0).

3. Analyze confidence intervals and return a verdict about convergence.

 

NOTES ON THE ALGORITHMS

We have two algorithms available: simplex and Newton. Formally, simplex algorithm is most appropriate for the least squares minimization and is most robust. However, it works 8x slower than the Newton, which is formally NOT optimal for least squares minimizations. However, it arrives at the identical values of the best fit parameters. There is some sort of glitch in Newton's diagnostics that even if it arrives at accurate best-fit values it does not think that and reports non-convergence with exitflag=-2. The option ignore_nonconvergence_flag may be used (if set to 1) to force the the fitting routine ignore these 'failed' Monte-Carlo runs and use results of ALL fitting runs for error determination, even with exitflags indicating failure to converge.

Confidence intervals of fitting parameters are computed by taking 2.5-97.5 % percentiles of the best-fit parameters of randomized datasets. The minimum number of trials min_number_of_trials has to be relatively large for accurate confidence interval estimate (with less than 50 trials the taking percentiles results in very discrete values and a convergence test is easily fooled by seeing unchanged percentile (while it did not change simply due to small number of added trials). If you need to reduce accuracy of conf. intervals you shouldn't change this min_number_of_trials but rather increase your convergence tolerance: fraction_conf. Value of 0.05 (fractional change in confidence intervals computed with and without last min_number_of_trials/2 runs) usually gives good estimate. However fraction_conf up to 0.25 also works well.

As the fitting progresses the confidence intervals are dynamically shown on the graph. Since the ranges of the intervals may be dramatically different for different parameters I show values normalized by last point. Confidence intervals are recalculated as new trials are added and this is what the curves are (normalized intervals for each parameter). You will visually see smaller fluctuations of normalized confidence intervals as more and more trials are added to analysis. The parameter fraction_conf is essentially your measure whether the intervals are at a plateau or not yet.

 

For a specific usage see TotalFit.m