Back to main index

Concepts of data fitting

 

See also

 

Back to top

 


 

General ideas

Back to top


 

Monte-Carlo analysis of optimization parameters

Introduction

Every functional dependence (equation) has specific properties such that some parameters are strongly defined by shape of experimental data while other parameters are defined only weakly. Additionally, specific distribution of experimental points in the dataset has great impact on definition of specific parameters obtain from fitting of this particular dataset. Noise in experimental data will have variable effect on these parameters. Perturbation of experimental data and independent variables followed by fitting in multiple trials is called Monte-Carlo analysis and provides a measure of confidence in the best-fit values. Multiple independent fitting runs allow building distributions for each fitting parameter as well as revealing their interdependency---correlation (such that deviation in one parameter value may be compensated by change in otherones without significant effect on sum of squares).

 

Diagnostics of convergence in fitting runs

A property of the TotalFit session vary_starting_parameters controls whether the standard randomizer or other randomizing modules perform variation of the starting parameters of for Monte-Carlo runs. Ideally, it has to be 'yes' at all times but this setting entails very slow convergence. In addition, the Newton-based algorigtms ('Newton-active-set' and 'Newton-interior-point') sometimes finish without improving on the initial guess. If this happens when vary_starting_parameters='yes' it is very difficult to detect this problem because you obtain a very broad random distribution of 'best-fit' results. The only way to detect this lack of convergence is to compare properties all_raw_starting_parameters and all_raw_final_parameters to see if fitting process lead to changes in values of fitting parameters. The properties all_filtered_starting_parameters and all_filtered_final_parameters contain the same information but have only successful runs.

One way to make this lack of convergence more easily detectable is setting vary_starting_parameters to 'no' so that the simulated data will be perturbed with random noise but fitting will always starts with the best-fit result to the experimental data. In this case, when fitting algorithm is not doing anything---one sees confidence intervals matching the best-fit parameters (zero intervals). Solution in this situation is to change minimizer or alter its options. I chose to use no setting as a default (switcheable by a user).

Sums of squares from all fitting runs are also collected in all_raw_SS and all_filtered_SS arrays as well as exitflags from the fitting routine (all_raw_exitflags and all_filtered_exitflags). Look at these arrays when troubleshooting your Monte-Carlo runs.

You may observe graphic result of every Monte-Carlo fitting run by setting a property of the TotalFit session: monte_carlo_monitoring. The options are: OFF/BRIEF/SEE-ALL-FIGURES

 

See TotalFit.m for more properties controlling the Monte-Carlo runs.

 

Setting up the Monte-Carlo fitting

Monte-Carlo analysis is typically started by setting setup_parameters.fit_mode='fit_with_error_analysis'. NOTE: Uncertainties of the variable parameters are ONLY MEANINGFUL IF your MODEL FITS the data WELL. Generally, this mode requires a lot of time, so see Parallelization section below for improving performance.

There are important aspects of Monte-Carlo analysis, which are described below.

Display of the Monte-Carlo results

The Monte-Carlo algorithm percentiles the optimization results from fitting runs and shows you confidence intervals. However, the 'best-fit' results is only one of the fitting results in a virtual experiment with fitting synthetic noisy data. Therefore, observing histograms of parameters is more informative: I prefer to take confidence intervals as calculated but extract the most probable value directly from distribution histograms. More on plotting histograms and parameter correlations see docs/TotalFit/methods_index:FITTING WITH DETERMINATION OF CONFIDENCE INTERVALS.

 

Back to top


 

 

Fitting algorithms

Algorithms for optimization of variable parameters of the models to minimize deviation from experimental data are based on standard MATLAB optimization functions. They come in two flavors: local and global solvers. Generally, the difference is that local solvers find a single minimium and do not attempt to verify whether it is global or not. Global solvers perfrom multiple local solver runs in attempt to find best local miminum. For description of these algorithms---see Optimization and Global Optimization toolboxes, correspondingly. All solvers return parameters for the miminum and save their detailed output in optimization_output property of the TotalFit session. Settings to invoke these algorithms and their general features are listed below:

'Newton-active-set' and 'Newton-interior-point'

'simplex'

 

'GlobalSearch'

 

'MultiStart'

 

'DirectSearch'

Back to top


Parallelization

 

Introduction

There are two separate concepts in parallelizing fitting runs. First, multiple local cores or a cluster may be used through Parallel Computing Toolbox of MATLAB . Second, multiple machines, which are not a part of a cluster, may be employed for a distributed run. For clarity, in what follows I use 'custer' to refer to a machine with multiple cpus/cores  or a cluster, while I use 'core' for a single cpu/core of a multicore workstation.

For a detailed description of fitting methods implemented in TotalFit see FITTING WITH DETERMINATION OF CONFIDENCE INTERVALS


Using parallelization tools of MATLAB

Using TotalFit deamons

If you do not have a cluster but have a network access to mutliple workstations --- TotalFit implements MATLAB-independent distributed-computing functionality through ssh and disk sharing.

 

Back to top