Tutorial 5. Global simulation and fitting

by Evgenii Kovrigin, 05/19/2011

Global fitting in IDAP involves simultaneous fitting of multiple datasets with shared parameters (such that some parameter is optimized 'globally' with respect to all datasets it is relevant to). Strictly speaking, we already did global fitting in Tutorial 4 where we had multiple spectral datasets corresponding to different concentrations of the receptor and ligand but all sharing the same equilibrium and thermodynamic constants. For clarity of the concept we will perform another 'global' fitting excercise in this tutorial.

Here we will assume that we observed NMR signals of two nuclei that report on the same binding event. Therefore, they share thermodynamic equilibrium binding constant and the kinetic rate constants but have individual resonance frequencies and line widths.

NOTE: The synthetic noisy data will be simulated only using random variation of spectral intensity (using a noise standard deviation figure). However, a very important source of experimental uncertainty in titration series is value of L/R that is uncertainty of amount of added ligand. The effect of this uncertainty that different traces may appear at shifted L/R values, which strongly impacts success of fitting. It is not trivial to take this uncertainty into account in Monte-Carlo fitting therefore we skip it here for simplicity and assume our L/R data absolutely precise. See Titration class for introduction of this significant uncertainty into Monte-Carlo analysis of confidence intervals.

Contents

Clean up workspace (to avoid unwanted effects of old variable assignments)

close all
clear all

CHOOSE PARAMETERS OF THE TITRATION SERIES

NOTE: L/R cannot be ZERO. Set some very small number instead!

Rtotal = 1e-3;
LRratio_array = [1e-6 0.2 0.4 0.6 0.8 1.0 1.2];
Y_RMSD=5e-7; % real line shape
Y_offset_percent=10;  % to offset plots in the figure vertically for easier viewing

CREATE TWO SERIES OF DATASETS FOR SIMULATION OF LINE SHAPES IN TITRATION

We will use a simple model U: R+L<=>RL

session_name='Tutorial_5';
session=TotalFit(session_name);

Common Parameters

dataset_class_name='NMRLineShapes1D'; % class name
model_name='U-model';
model_numeric_solver='analytical';
model_numeric_options='no options';
number_of_datasets=length(LRratio_array);

Create a simulation series 1

series_name1='Series_1';
X_min1=-1000;
X_max1=1500;
number_of_points_to_simulate=60;

X_column=linspace(X_min1, X_max1, number_of_points_to_simulate)'; % use ' to transpose to a column-vector

session.create_simulation_series(series_name1, dataset_class_name, model_name, ...
                 model_numeric_solver, model_numeric_options,...
                 number_of_datasets, X_column);

Create a simulation series 2

series_name2='Series_2';
X_min2=1500;
X_max2=2000;
number_of_points_to_simulate=20;

X_column=linspace(X_min2, X_max2, number_of_points_to_simulate)'; % use ' to transpose to a column-vector

session.create_simulation_series(series_name2, dataset_class_name, model_name, ...
                 model_numeric_solver, model_numeric_options,...
                 number_of_datasets, X_column);

Initialize Relation Matrix

session.initialize_relation_matrix(sprintf('%s.all_datasets.txt', session_name))
Dataset listing is written to "Tutorial_5.all_datasets.txt".

Link Parameters Within Each Series. Datasets inside each series share equilbirium and kinetic constants as well as frequencies and linewidths (because all datasets correspond to line shapes of the same nucleus).

To view parameter numbering---print out the active model description

dataset_index=session.dataset_index(series_name1,1);
session.dataset_show_active_model(dataset_index)
ans =

Active model:
Model 2:  "U-model"
Model description
"1D NMR line shape for the U model, no temperature/field dependence, analytical"
Model handle: line_shape_equations_1D.U_model_1D
Current solver: analytical
Model parameters: 
1: Rtotal     2: LRratio     3: log10(K_A)     4: k_2_A     5: w0_1     6: w0_2     7: FWHH_1     8: FWHH_2     9: ScaleFactor     



Set the indices of parameter you want to link

parameter_number_array=[ 3 4 5 6 7 8 9 ];
session.batch_link_parameters_in_series(series_name1, parameter_number_array);
session.batch_link_parameters_in_series(series_name2, parameter_number_array);

Link Parameters Between Series Both series report on the same binding event so they have to have common binding affinity and kinetic constants

It is sufficient to link the first dataset of one series with the first dataset to another to have all links established (because they are already linked within the series). The rules of parameter linkage follow transitive logic: if A links to B, and B links to C, then A will link to C.

First, we obtain indices of the first datasets from both series (we need internal indexing of datasets in the 'session' object for this task).

series1_first_dataset_index=session.dataset_index(series_name1,1);
series2_first_dataset_index=session.dataset_index(series_name2,1);

To view internal indices for the datasets in any series use session.show_series_datasets(series_name1)

Link Ka

dataset1_parameter_index=3;
dataset2_parameter_index=3;
session.link_parameters(series1_first_dataset_index, dataset1_parameter_index,...
                        series2_first_dataset_index, dataset2_parameter_index);

Link k2

dataset1_parameter_index=4;
dataset2_parameter_index=4;
session.link_parameters(series1_first_dataset_index, dataset1_parameter_index,...
                        series2_first_dataset_index, dataset2_parameter_index);

Propagate information about links across all datasets

session.generate_fitting_environment(sprintf('%s.fitting_environment.txt', session_name))
Fitting environment saved in Tutorial_5.fitting_environment.txt.

ans =

New parameter relations (indexed as in all_parameters vector):

[1]     1:Rtotal
[2]     1:LRratio
[3]     1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)
[4]     1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A
[5]     1:w0_1 | 2:w0_1 | 3:w0_1 | 4:w0_1 | 5:w0_1 | 6:w0_1 | 7:w0_1
[6]     1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2
[7]     1:FWHH_1 | 2:FWHH_1 | 3:FWHH_1 | 4:FWHH_1 | 5:FWHH_1 | 6:FWHH_1 | 7:FWHH_1
[8]     1:FWHH_2 | 2:FWHH_2 | 3:FWHH_2 | 4:FWHH_2 | 5:FWHH_2 | 6:FWHH_2 | 7:FWHH_2
[9]     1:ScaleFactor | 2:ScaleFactor | 3:ScaleFactor | 4:ScaleFactor | 5:ScaleFactor | 6:ScaleFactor | 7:ScaleFactor
[10]     2:Rtotal
[11]     2:LRratio
[12]   n/a
[13]   n/a
[14]   n/a
[15]   n/a
[16]   n/a
[17]   n/a
[18]   n/a
[19]     3:Rtotal
[20]     3:LRratio
[21]   n/a
[22]   n/a
[23]   n/a
[24]   n/a
[25]   n/a
[26]   n/a
[27]   n/a
[28]     4:Rtotal
[29]     4:LRratio
[30]   n/a
[31]   n/a
[32]   n/a
[33]   n/a
[34]   n/a
[35]   n/a
[36]   n/a
[37]     5:Rtotal
[38]     5:LRratio
[39]   n/a
[40]   n/a
[41]   n/a
[42]   n/a
[43]   n/a
[44]   n/a
[45]   n/a
[46]     6:Rtotal
[47]     6:LRratio
[48]   n/a
[49]   n/a
[50]   n/a
[51]   n/a
[52]   n/a
[53]   n/a
[54]   n/a
[55]     7:Rtotal
[56]     7:LRratio
[57]   n/a
[58]   n/a
[59]   n/a
[60]   n/a
[61]   n/a
[62]   n/a
[63]   n/a
[64]     8:Rtotal
[65]     8:LRratio
[66]   n/a
[67]   n/a
[68]     8:w0_1 | 9:w0_1 | 10:w0_1 | 11:w0_1 | 12:w0_1 | 13:w0_1 | 14:w0_1
[69]     8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2
[70]     8:FWHH_1 | 9:FWHH_1 | 10:FWHH_1 | 11:FWHH_1 | 12:FWHH_1 | 13:FWHH_1 | 14:FWHH_1
[71]     8:FWHH_2 | 9:FWHH_2 | 10:FWHH_2 | 11:FWHH_2 | 12:FWHH_2 | 13:FWHH_2 | 14:FWHH_2
[72]     8:ScaleFactor | 9:ScaleFactor | 10:ScaleFactor | 11:ScaleFactor | 12:ScaleFactor | 13:ScaleFactor | 14:ScaleFactor
[73]     9:Rtotal
[74]     9:LRratio
[75]   n/a
[76]   n/a
[77]   n/a
[78]   n/a
[79]   n/a
[80]   n/a
[81]   n/a
[82]     10:Rtotal
[83]     10:LRratio
[84]   n/a
[85]   n/a
[86]   n/a
[87]   n/a
[88]   n/a
[89]   n/a
[90]   n/a
[91]     11:Rtotal
[92]     11:LRratio
[93]   n/a
[94]   n/a
[95]   n/a
[96]   n/a
[97]   n/a
[98]   n/a
[99]   n/a
[100]     12:Rtotal
[101]     12:LRratio
[102]   n/a
[103]   n/a
[104]   n/a
[105]   n/a
[106]   n/a
[107]   n/a
[108]   n/a
[109]     13:Rtotal
[110]     13:LRratio
[111]   n/a
[112]   n/a
[113]   n/a
[114]   n/a
[115]   n/a
[116]   n/a
[117]   n/a
[118]     14:Rtotal
[119]     14:LRratio
[120]   n/a
[121]   n/a
[122]   n/a
[123]   n/a
[124]   n/a
[125]   n/a
[126]   n/a

Summary of datasets with lookup vectors (position of each parameter in all_parameter vector)
--------
Dataset 1/Series_1-1,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   1   2   3   4   5   6   7   8   9

Dataset 2/Series_1-2,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   10   11   3   4   5   6   7   8   9

Dataset 3/Series_1-3,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   19   20   3   4   5   6   7   8   9

Dataset 4/Series_1-4,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   28   29   3   4   5   6   7   8   9

Dataset 5/Series_1-5,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   37   38   3   4   5   6   7   8   9

Dataset 6/Series_1-6,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   46   47   3   4   5   6   7   8   9

Dataset 7/Series_1-7,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   55   56   3   4   5   6   7   8   9

Dataset 8/Series_2-1,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   64   65   3   4   68   69   70   71   72

Dataset 9/Series_2-2,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   73   74   3   4   68   69   70   71   72

Dataset 10/Series_2-3,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   82   83   3   4   68   69   70   71   72

Dataset 11/Series_2-4,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   91   92   3   4   68   69   70   71   72

Dataset 12/Series_2-5,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   100   101   3   4   68   69   70   71   72

Dataset 13/Series_2-6,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   109   110   3   4   68   69   70   71   72

Dataset 14/Series_2-7,  model "U-model"
Parameters          :    Rtotal    LRratio    log10(K_A)    k_2_A    w0_1    w0_2    FWHH_1    FWHH_2    ScaleFactor
Lookup vector       :   118   119   3   4   68   69   70   71   72



Now we have all parameters linked properly: Ka and k2 are global for all
datasets, frequencies and line widths are global within each series but
unlinked between series.

ASSIGN STARTING PARAMETERS

Define common parameter values NOTE: Technically, we do not need meaningful limits set for a simulation run. However, we will do fitting in the same TotalFit session therefore it is convenient to define limits now.

Binding constant

Log10_K_A=log10(1e6);
Log10_K_A_AbsMin=log10(1e1);
Log10_K_A_AbsMax=log10(1e10);

Dissociation rate constant

k_2_A=200;
k_2_A_AbsMin=0.1;
k_2_A_AbsMax=3000;

ScaleFactor

ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;

FAKE_VALUE=1e-8; % A 'place-holder' value to be substituted later

First series (slow exchange)

Frequency of R

w0_R=-300;
w0_R_AbsMin=X_min1; % set these limits to the spectral window
w0_R_AbsMax=X_max1;

Frequency of RL

w0_RL=800;
w0_RL_AbsMin=X_min1; % set these limits to the spectral window
w0_RL_AbsMax=X_max1;

Line width of R

FWHH_R=100;
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;

Line width of RL

FWHH_RL=100;
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;

ScaleFactor

ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;

Create a vector of parameter values

simulation_parameters1=[ FAKE_VALUE     FAKE_VALUE   ...
                        Log10_K_A    k_2_A  ...
                       w0_R     w0_RL      FWHH_R     FWHH_RL    ScaleFactor ];

Second series

Frequency of R

w0_R=1700;
w0_R_AbsMin=X_min2; % set these limits to the spectral window
w0_R_AbsMax=X_max2;

Frequency of RL

w0_RL=1800;
w0_RL_AbsMin=X_min2; % set these limits to the spectral window
w0_RL_AbsMax=X_max2;

Line width of R

FWHH_R=100;
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;

Line width of RL

FWHH_RL=100;
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;

Create a vector of parameter values

simulation_parameters2=[ FAKE_VALUE     FAKE_VALUE   ...
                        Log10_K_A    k_2_A  ...
                       w0_R     w0_RL      FWHH_R     FWHH_RL    ScaleFactor ];

Assign linked parameters (I keep names of function arguments for information)

fake_limits=zeros(1,length(simulation_parameters1));
Monte_Carlo_range_min=fake_limits;
Monte_Carlo_range_max=fake_limits;
parameter_limits_min=fake_limits;
parameter_limits_max=fake_limits;

session.assign_parameter_values(series1_first_dataset_index, simulation_parameters1, ...
        Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);

session.assign_parameter_values(series2_first_dataset_index, simulation_parameters2, ...
        Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);

Assign UNlinked parameters

Rtotal_vector=ones(1,length(LRratio_array))*Rtotal; % this one is constant for all datasets

Create vectors of ranges for formal assignment (NOTE: they are different length from the "fake_limits" above).

unlinked_fake_ranges=zeros(1,number_of_datasets);

MonteCarlo_range_min_array=unlinked_fake_ranges;
MonteCarlo_range_max_array=unlinked_fake_ranges;
parameter_limits_min_array=unlinked_fake_ranges;
parameter_limits_max_array=unlinked_fake_ranges;

Assign Rtotal

parameter_number=1;
values_array=Rtotal_vector;

series_name=series_name1;
session.assign_unlinked_parameter_in_series(...
             series_name, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

series_name=series_name2;
session.assign_unlinked_parameter_in_series(...
             series_name, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

Assign LRratio

parameter_number=2;
values_array=LRratio_array;

series_name=series_name1;
session.assign_unlinked_parameter_in_series(...
             series_name, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

series_name=series_name2;
session.assign_unlinked_parameter_in_series(...
             series_name, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

CALCULATE MODEL VALUES

Run a simulation

X_standard_dev=0; % no need to have errors of Y
session.simulate_series(series_name1, X_standard_dev, Y_RMSD);
session.simulate_series(series_name2, X_standard_dev, Y_RMSD);

Plot the simulation result

figure_handle1=session.plot_1D_series(series_name1, 'Simulation 4', Y_offset_percent);
figure_handle2=session.plot_1D_series(series_name2, 'Simulation 4', Y_offset_percent);

Nice 'experimentally-looking' result.

Save initial state of fitting session on a disk for future inspection and reuse

session.export_results(sprintf('%s.simulated_data', session_name), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.simulated_data/".
Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.simulated_data/adjustable_parameters_only.txt".
Statistics on per-dataset basis is written into "Tutorial_5.simulated_data/all_datasets.txt".
Writing figures...
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Done

SELECT AND ASSIGN STARTING PARAMETERS FOR FIT

Define parameter values (common first)

Binding constant

Log10_K_A=log10(1e7);
Log10_K_A_AbsMin=log10(1e1);
Log10_K_A_AbsMax=log10(1e10);

Dissociation rate constant

k_2_A=10;
k_2_A_AbsMin=0.1;
k_2_A_AbsMax=3000;

ScaleFactor

ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;

FAKE_VALUE=1e-8; % A 'place-holder' value

First series (slow exchange)

Frequency of R

w0_R=-300;    % is known exactly from spectral traces in slow exchange
w0_R_AbsMin=X_min1;
w0_R_AbsMax=X_max1;

Frequency of RL

w0_RL=800;       % is known exactly from spectral traces in slow exchange
w0_RL_AbsMin=X_min1;
w0_RL_AbsMax=X_max1;

Line width of R

FWHH_R=100;         % is known exactly from spectral traces in slow exchange
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;

Line width of RL

FWHH_RL=100;       % well estimated from nearly-saturated spectral trace
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;

ScaleFactor

ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;

Create a vector of parameter values and absolute limits

starting_parameters1=[ FAKE_VALUE     FAKE_VALUE   ...
                        Log10_K_A    k_2_A  ...
                       w0_R     w0_RL      FWHH_R     FWHH_RL    ScaleFactor ];
min_limits1=[ 0 0 Log10_K_A_AbsMin k_2_A_AbsMin w0_R_AbsMin w0_RL_AbsMin  FWHH_R_AbsMin  FWHH_RL_AbsMin ScaleFactor_AbsMin ];
max_limits1=[ 0 0 Log10_K_A_AbsMax k_2_A_AbsMax w0_R_AbsMax w0_RL_AbsMax  FWHH_R_AbsMax  FWHH_RL_AbsMax ScaleFactor_AbsMax ];

Mode for model parameters: fixed/variable = 0/1

parameter_modes1=[ 0 0 1 1 0 1 0 0 0];

Second series

Frequency of R

w0_R=1700;        % is known exactly from spectral trace of free R
w0_R_AbsMin=X_min2;
w0_R_AbsMax=X_max2;

Frequency of RL

w0_RL=1750;       % uncertain due to incomplete saturation of RL
w0_RL_AbsMin=X_min2;
w0_RL_AbsMax=X_max2;

Line width of R

FWHH_R=100;        % is known exactly from spectral trace of free R
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;

Line width of RL

FWHH_RL=100;        % well estimated from a nearly-saturated spectral trace
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;

Create a vector of parameter values and absolute limits

starting_parameters2=[ FAKE_VALUE     FAKE_VALUE   ...
                        Log10_K_A    k_2_A  ...
                       w0_R     w0_RL      FWHH_R     FWHH_RL    ScaleFactor ];

min_limits2=[ 0 0 Log10_K_A_AbsMin k_2_A_AbsMin w0_R_AbsMin w0_RL_AbsMin  FWHH_R_AbsMin  FWHH_RL_AbsMin ScaleFactor_AbsMin ];
max_limits2=[ 0 0 Log10_K_A_AbsMax k_2_A_AbsMax w0_R_AbsMax w0_RL_AbsMax  FWHH_R_AbsMax  FWHH_RL_AbsMax ScaleFactor_AbsMax ];

Mode for model parameters: fixed/variable = 0/1

parameter_modes2=[ 0 0 1 1 0 1 0 0 0];

Assign linked parameters

First series

first_dataset_index=session.dataset_index(series_name1,1);

Monte_Carlo_range_min=min_limits1;
Monte_Carlo_range_max=max_limits1;
parameter_limits_min=min_limits1;
parameter_limits_max=max_limits1;

session.assign_parameter_values(first_dataset_index, starting_parameters1, ...
        Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);

Second series

first_dataset_index=session.dataset_index(series_name2,1);

Monte_Carlo_range_min=min_limits2;
Monte_Carlo_range_max=max_limits2;
parameter_limits_min=min_limits2;
parameter_limits_max=max_limits2;

session.assign_parameter_values(first_dataset_index, starting_parameters2, ...
        Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);

Assign unlinked parameters Create vectors of ranges for formal assignment because these are constant in fitting.

unlinked_fake_ranges=zeros(1,number_of_datasets);

Assign Rtotal --- this will not be fit so no need for ranges

parameter_number=1;
values_array=Rtotal_vector;  % this remains the same
MonteCarlo_range_min_array=unlinked_fake_ranges;
MonteCarlo_range_max_array=unlinked_fake_ranges;
parameter_limits_min_array=unlinked_fake_ranges;
parameter_limits_max_array=unlinked_fake_ranges;

session.assign_unlinked_parameter_in_series(...
             series_name1, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

 session.assign_unlinked_parameter_in_series(...
             series_name2, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

Assign LRratio

parameter_number=2;
values_array=LRratio_array;

session.assign_unlinked_parameter_in_series(...
             series_name1, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

session.assign_unlinked_parameter_in_series(...
             series_name2, parameter_number, values_array, ...
             MonteCarlo_range_min_array, MonteCarlo_range_max_array, ...
             parameter_limits_min_array, parameter_limits_max_array);

Inspect initial guess

figure_handle1=session.plot_1D_series(series_name1, 'Initial', Y_offset_percent);
figure_handle2=session.plot_1D_series(series_name2, 'Initial', Y_offset_percent);

Initial guess is appropriately bad to begin fitting.

Save initial state of fitting session on a disk for future inspection and reuse

session.export_results(sprintf('%s.initial_guess', session_name), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.initial_guess/".
Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.initial_guess/adjustable_parameters_only.txt".
Statistics on per-dataset basis is written into "Tutorial_5.initial_guess/all_datasets.txt".
Writing figures...
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Done

PERFORM FITTING

Set which parameters of the model will be fixed and variable

% first series
first_dataset_index=session.dataset_index(series_name1,1);
session.set_parameters_modes(first_dataset_index, parameter_modes1);
% second series
first_dataset_index=session.dataset_index(series_name2,1);
session.set_parameters_modes(first_dataset_index, parameter_modes2);

Select fitting algorithm. Choices: 'simplex', 'MultiStart' 'GlobalSearch', 'DirectSearch'. See MATLAB documentation for their description and options.

Before starting this fitting: start parallel processing framework by issuing 'matlabpool open N' command, where N is the number of your CPUs. The IDAP algorithm will then farm Monte-Carlo runs on multiple CPUs. If you do not have multiple CPU---do not do anything and it will still work (slower). The simplest command to start your parallel processors: matlabpool open

fit_only_algorithm='GlobalSearch';

Report on start-up and show variable parameter names

fprintf(1,'\n\n==============\nFitting Algorithm: %s\nVariable parameters:', fit_only_algorithm);
session.fit_parameter_names{1:end}

==============
Fitting Algorithm: GlobalSearch
Variable parameters:
ans =

  1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)


ans =

  1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A


ans =

  1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2


ans =

  8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2

Fit without error determination

tic
session.fit_only(fit_only_algorithm);
toc
GlobalSearch stopped because it analyzed all the trial points.

All 108 local solver runs converged with a positive local solver exit flag.
Elapsed time is 94.369409 seconds.

Make sure you read output of the fitting routine concerning convergence (or lack of it)!

Report

session.show_fit_statistics()
session.transfer_all_parameters_to_individual_datasets();
ans =



Fitting results for "Tutorial_5"
==========================

Fitting method: GlobalSearch
Number of active datasets (non-zero priority): 14

Reporting only on variable parameters (dataset-#:parameter-name):
(# in the fit-vector)            [min value, max value] for 95% confidence interval
--------------------------
(1) name:  "  1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)"
(1) value=  4.976519e+00      [ 0.000000e+00       0.000000e+00 ]

(2) name:  "  1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A"
(2) value=  2.242349e+02      [ 0.000000e+00       0.000000e+00 ]

(3) name:  "  1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2"
(3) value=  8.561573e+02      [ 0.000000e+00       0.000000e+00 ]

(4) name:  "  8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2"
(4) value=  1.804079e+03      [ 0.000000e+00       0.000000e+00 ]

Total sum of squares (SS) = 1.07119e-05
Statistics per dataset:
------------
(1) "Series_1-1":     Priority=1,     R^2=0.977,    4.7% of total SS,   Model= "U-model" 
(2) "Series_1-2":     Priority=1,     R^2=0.908,    5.5% of total SS,   Model= "U-model" 
(3) "Series_1-3":     Priority=1,     R^2=0.798,    3.9% of total SS,   Model= "U-model" 
(4) "Series_1-4":     Priority=1,     R^2=0.672,    4.9% of total SS,   Model= "U-model" 
(5) "Series_1-5":     Priority=1,     R^2=0.894,    4.6% of total SS,   Model= "U-model" 
(6) "Series_1-6":     Priority=1,     R^2=0.866,    25% of total SS,   Model= "U-model" 
(7) "Series_1-7":     Priority=1,     R^2=0.920,    15% of total SS,   Model= "U-model" 
(8) "Series_2-1":     Priority=1,     R^2=0.993,    4.3% of total SS,   Model= "U-model" 
(9) "Series_2-2":     Priority=1,     R^2=0.991,    4.9% of total SS,   Model= "U-model" 
(10) "Series_2-3":     Priority=1,     R^2=0.980,    9.9% of total SS,   Model= "U-model" 
(11) "Series_2-4":     Priority=1,     R^2=0.993,    3.3% of total SS,   Model= "U-model" 
(12) "Series_2-5":     Priority=1,     R^2=0.996,    2.3% of total SS,   Model= "U-model" 
(13) "Series_2-6":     Priority=1,     R^2=0.989,    6.9% of total SS,   Model= "U-model" 
(14) "Series_2-7":     Priority=1,     R^2=0.993,    4.4% of total SS,   Model= "U-model" 


Inspect final result

figure_handle1=session.plot_1D_series(series_name1, 'Final', Y_offset_percent);
figure_handle2=session.plot_1D_series(series_name2, 'Final', Y_offset_percent);

Export final result

session.export_results(sprintf('%s.Fit_Only.%s', session_name, fit_only_algorithm), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.Fit_Only.GlobalSearch/".
Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.Fit_Only.GlobalSearch/adjustable_parameters_only.txt".
Statistics on per-dataset basis is written into "Tutorial_5.Fit_Only.GlobalSearch/all_datasets.txt".
Writing figures...
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Done

DETERMINE CONFIDENCE INTERVALS USING MONTE-CARLO APPROACH

IMPORTANT NOTE: Confidence intervals are ONLY meaningful if the model fits data well that is a curve goes trough the points! Confidence intervals and parameter correlation plots then show how variation of the curves passing WITHIN errorbars may be accounted for by variation in parameters. If the curve DOES NOT pass through the error bars---Monte-Carlo run only wastes your time because the randomizer varies the synthetic data AROUND THE IDEAL SIMULATED CURVE. However, if this curve IS NOT GOING THROUGH the experimental data to begin with---entire Monte-Carlo run is meaningless because its random variation does reproduce deviation of the data from the model curve.

NOTE: Starting parameter values are the ones that resulted from previous fit.

fit_algorithm='simplex';  % fastest robust algorithm
% fit_algorithm='GlobalSearch';  % would be too slow

Report on start-up and show variable parameter names

fprintf(1,'\n\n==============\nMonte-Carlo Fitting Algorithm: %s\nVariable parameters:', fit_algorithm);
session.fit_parameter_names{1:end}

==============
Monte-Carlo Fitting Algorithm: simplex
Variable parameters:
ans =

  1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)


ans =

  1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A


ans =

  1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2


ans =

  8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2

Start fitting

session.fit(fit_algorithm, 'single_cluster');
session.show_fit_statistics()
session.transfer_all_parameters_to_individual_datasets();
Starting fitting session using simplex algorithm on a single cluster.
Working folder: "single_cluster_fit"
Variable parameter:
  Columns 1 through 3

    [1x214 char]    [1x144 char]    [1x62 char]

  Column 4

    [1x67 char]

Total dataset is saved into "single_cluster_fit".
Checking on "single_cluster_fit":  10 trials.

Checking on "single_cluster_fit":  20 trials.

Checking on "single_cluster_fit":  30 trials.

Checking on "single_cluster_fit":  40 trials.

Checking on "single_cluster_fit":  50 trials.

Using ALL 50 fitting runs ignoring the exit flags indicating that 0 runs failed to converge).
Checking on "single_cluster_fit":  60 trials.

Using ALL 60 fitting runs ignoring the exit flags indicating that 0 runs failed to converge).
Checking on "single_cluster_fit":  70 trials.

Using ALL 70 fitting runs ignoring the exit flags indicating that 0 runs failed to converge).
Elapsed time is 147.651146 seconds.

ans =



Fitting results for "Tutorial_5"
==========================

Fitting method: simplex
Number of active datasets (non-zero priority): 14

Reporting only on variable parameters (dataset-#:parameter-name):
(# in the fit-vector)            [min value, max value] for 95% confidence interval
--------------------------
(1) name:  "  1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)"
(1) value=  6.007956e+00      [ 5.884858e+00       6.160755e+00 ]

(2) name:  "  1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A"
(2) value=  1.928793e+02      [ 1.753941e+02       2.069492e+02 ]

(3) name:  "  1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2"
(3) value=  8.002024e+02      [ 7.976611e+02       8.034264e+02 ]

(4) name:  "  8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2"
(4) value=  1.799248e+03      [ 1.798598e+03       1.800418e+03 ]

Total sum of squares (SS) = 6.80906e-06
Statistics per dataset:
------------
(1) "Series_1-1":     Priority=1,     R^2=0.977,    7.4% of total SS,   Model= "U-model" 
(2) "Series_1-2":     Priority=1,     R^2=0.913,    8.2% of total SS,   Model= "U-model" 
(3) "Series_1-3":     Priority=1,     R^2=0.813,    5.6% of total SS,   Model= "U-model" 
(4) "Series_1-4":     Priority=1,     R^2=0.684,    7.4% of total SS,   Model= "U-model" 
(5) "Series_1-5":     Priority=1,     R^2=0.905,    6.4% of total SS,   Model= "U-model" 
(6) "Series_1-6":     Priority=1,     R^2=0.980,    5.8% of total SS,   Model= "U-model" 
(7) "Series_1-7":     Priority=1,     R^2=0.975,    7.7% of total SS,   Model= "U-model" 
(8) "Series_2-1":     Priority=1,     R^2=0.993,    6.8% of total SS,   Model= "U-model" 
(9) "Series_2-2":     Priority=1,     R^2=0.992,    6.9% of total SS,   Model= "U-model" 
(10) "Series_2-3":     Priority=1,     R^2=0.980,    16% of total SS,   Model= "U-model" 
(11) "Series_2-4":     Priority=1,     R^2=0.994,    4.6% of total SS,   Model= "U-model" 
(12) "Series_2-5":     Priority=1,     R^2=0.996,    3.3% of total SS,   Model= "U-model" 
(13) "Series_2-6":     Priority=1,     R^2=0.992,    7.9% of total SS,   Model= "U-model" 
(14) "Series_2-7":     Priority=1,     R^2=0.993,    6.5% of total SS,   Model= "U-model" 


SAVE AND PLOT DATA IN MEMORY-CONSERVING WAY

Key principle is to plot a few figures, save them and close them. MATLAB can handle about 25 figures, then slows down and at about 40-70 locks up!

*Create place for storage*
project_path=sprintf('%s.Fit_Monte_Carlo.%s', session_name, fit_algorithm);
session.export_results(project_path, {});
Output current session state, statistics and graphs into "Tutorial_5.Fit_Monte_Carlo.simplex/".
Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.Fit_Monte_Carlo.simplex/adjustable_parameters_only.txt".
Statistics on per-dataset basis is written into "Tutorial_5.Fit_Monte_Carlo.simplex/all_datasets.txt".
Writing figures...
Done

Inspect and save final results

figure_handle1=session.plot_1D_series(series_name1, 'Monte Carlo', Y_offset_percent);
figure_handle2=session.plot_1D_series(series_name2, 'Monte Carlo', Y_offset_percent);
session.save_figures(project_path, 'Fits', {figure_handle1 figure_handle2});
close all
Output current graphics for "Fits" into "Tutorial_5.Fit_Monte_Carlo.simplex/Fits/".
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written

BUILD HISTOGRAMS OF DISTRIBUTIONS OF FITTING PARAMETERS

Histograms are plotted using full distribution of parameters from fitting of synthetic noisy datasets. The bottom panel of histogram graphs gives normalized sum of squares for all runs (red dot - the sum of squares from the best-fit to experimental data).

NOTE: If fitting parameters ended up at or beyond parameter limits they are thrown out of these distributions

NOTE 2: Scaling of these graphs is done using min/max limits of the parameters. This may be sub-optimal for viewing and analysis. If you need to adjust a graph---load FIG file from the folder with graphic output and edit in MATLAB.

NOTE 3: The previous note has to be generalized: IDAP has not been optimized for 'pretty' graphic output because any graphs may be adjusted using standard MATLAB tools. User is invited to modify the code to his/her liking to satisfy their taste.

histogram_bins=20;

To plot histograms for selected parameters use the following: NOTE: the index is NOT from the model but from the vector of variable parameters. variable_parameters_indices=[1 2]; % All parameters you want to see parameter_histograms_handles=session.plot_specified_histograms(variable_parameters_indices, histogram_bins);

To automatically plot histograms for ALL variable parameters use:

parameter_histograms_handles=session.plot_all_histograms(histogram_bins);

save

session.save_figures(project_path, 'Histograms', parameter_histograms_handles);
close all
Output current graphics for "Histograms" into "Tutorial_5.Fit_Monte_Carlo.simplex/Histograms/".
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 3: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 4: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written

PLOT PARAMETER CORRELATIONS

When fitting with complex models we observe that parameters of the model may be correlated. It means that for a given signal/noise ratio in the data it is possible to compensate variation in one parameter by corresponding adjustment of the correlated parameter. This correlation happens, in the first place, when models are overparameterized that is have more parameters than necessary to describe the data. Second situation is when you have an adequate model but increased noise in the data 'hides' specific data feature thus making parameters less defined. It is important to know about parameter correlation in the model to understand chances of fitting out accurate values of these parameters.

First, you specify pairs of parameters you want to see correlations. To view parameter number in the vector of variable parameters issue:

session.fit_parameter_names{1:end}
ans =

  1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)


ans =

  1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A


ans =

  1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2


ans =

  8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2

Choose parameter pairs to view

parameter_pairs=[
    1 2
    1 3
    2 3
    1 4
    2 4
    3 4
    ];

Plot these specific pairs.

figure_handles = session.plot_specified_correlations(parameter_pairs);

Alternatively, you may plot all of them but that may be massive number of graphs therefore is not advisable for multiple datasets. Yet, if you want to do it use: session.plot_all_correlations()

save

session.save_figures(project_path, 'Correlations', figure_handles);
close all
Output current graphics for "Correlations" into "Tutorial_5.Fit_Monte_Carlo.simplex/Correlations/".
Figure 1: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 2: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 3: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 4: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 5: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written
Figure 6: small PNG image,  large PNG image,  EPS image,  MATLAB figure file - written

IMPORTANT NOTE ON DATA SAVED FROM FITTING SESSIONS

The output folder contains current statistics of the data analysis session along with some graphs PNG, EPS and FIG formats. The current state of IDAP fitting session is saved in session.mat. You may load it back in MATLAB using load() command and then examine any graphs, make any changes and continue fitting your data further.