Tutorial 5. Global simulation and fitting
by Evgenii Kovrigin, 05/19/2011
Global fitting in IDAP involves simultaneous fitting of multiple datasets with shared parameters (such that some parameter is optimized 'globally' with respect to all datasets it is relevant to). Strictly speaking, we already did global fitting in Tutorial 4 where we had multiple spectral datasets corresponding to different concentrations of the receptor and ligand but all sharing the same equilibrium and thermodynamic constants. For clarity of the concept we will perform another 'global' fitting excercise in this tutorial.
Here we will assume that we observed NMR signals of two nuclei that report on the same binding event. Therefore, they share thermodynamic equilibrium binding constant and the kinetic rate constants but have individual resonance frequencies and line widths.
NOTE: The synthetic noisy data will be simulated only using random variation of spectral intensity (using a noise standard deviation figure). However, a very important source of experimental uncertainty in titration series is value of L/R that is uncertainty of amount of added ligand. The effect of this uncertainty that different traces may appear at shifted L/R values, which strongly impacts success of fitting. It is not trivial to take this uncertainty into account in Monte-Carlo fitting therefore we skip it here for simplicity and assume our L/R data absolutely precise. See Titration class for introduction of this significant uncertainty into Monte-Carlo analysis of confidence intervals.
Contents
- CHOOSE PARAMETERS OF THE TITRATION SERIES
- CREATE TWO SERIES OF DATASETS FOR SIMULATION OF LINE SHAPES IN TITRATION
- ASSIGN STARTING PARAMETERS
- CALCULATE MODEL VALUES
- SELECT AND ASSIGN STARTING PARAMETERS FOR FIT
- PERFORM FITTING
- DETERMINE CONFIDENCE INTERVALS USING MONTE-CARLO APPROACH
- SAVE AND PLOT DATA IN MEMORY-CONSERVING WAY
- BUILD HISTOGRAMS OF DISTRIBUTIONS OF FITTING PARAMETERS
- PLOT PARAMETER CORRELATIONS
- IMPORTANT NOTE ON DATA SAVED FROM FITTING SESSIONS
Clean up workspace (to avoid unwanted effects of old variable assignments)
close all clear all
CHOOSE PARAMETERS OF THE TITRATION SERIES
NOTE: L/R cannot be ZERO. Set some very small number instead!
Rtotal = 1e-3; LRratio_array = [1e-6 0.2 0.4 0.6 0.8 1.0 1.2]; Y_RMSD=5e-7; % real line shape Y_offset_percent=10; % to offset plots in the figure vertically for easier viewing
CREATE TWO SERIES OF DATASETS FOR SIMULATION OF LINE SHAPES IN TITRATION
We will use a simple model U: R+L<=>RL
session_name='Tutorial_5';
session=TotalFit(session_name);
Common Parameters
dataset_class_name='NMRLineShapes1D'; % class name model_name='U-model'; model_numeric_solver='analytical'; model_numeric_options='no options'; number_of_datasets=length(LRratio_array);
Create a simulation series 1
series_name1='Series_1'; X_min1=-1000; X_max1=1500; number_of_points_to_simulate=60; X_column=linspace(X_min1, X_max1, number_of_points_to_simulate)'; % use ' to transpose to a column-vector session.create_simulation_series(series_name1, dataset_class_name, model_name, ... model_numeric_solver, model_numeric_options,... number_of_datasets, X_column);
Create a simulation series 2
series_name2='Series_2'; X_min2=1500; X_max2=2000; number_of_points_to_simulate=20; X_column=linspace(X_min2, X_max2, number_of_points_to_simulate)'; % use ' to transpose to a column-vector session.create_simulation_series(series_name2, dataset_class_name, model_name, ... model_numeric_solver, model_numeric_options,... number_of_datasets, X_column);
Initialize Relation Matrix
session.initialize_relation_matrix(sprintf('%s.all_datasets.txt', session_name))
Dataset listing is written to "Tutorial_5.all_datasets.txt".
Link Parameters Within Each Series. Datasets inside each series share equilbirium and kinetic constants as well as frequencies and linewidths (because all datasets correspond to line shapes of the same nucleus).
To view parameter numbering---print out the active model description
dataset_index=session.dataset_index(series_name1,1); session.dataset_show_active_model(dataset_index)
ans = Active model: Model 2: "U-model" Model description "1D NMR line shape for the U model, no temperature/field dependence, analytical" Model handle: line_shape_equations_1D.U_model_1D Current solver: analytical Model parameters: 1: Rtotal 2: LRratio 3: log10(K_A) 4: k_2_A 5: w0_1 6: w0_2 7: FWHH_1 8: FWHH_2 9: ScaleFactor
Set the indices of parameter you want to link
parameter_number_array=[ 3 4 5 6 7 8 9 ]; session.batch_link_parameters_in_series(series_name1, parameter_number_array); session.batch_link_parameters_in_series(series_name2, parameter_number_array);
Link Parameters Between Series Both series report on the same binding event so they have to have common binding affinity and kinetic constants
It is sufficient to link the first dataset of one series with the first dataset to another to have all links established (because they are already linked within the series). The rules of parameter linkage follow transitive logic: if A links to B, and B links to C, then A will link to C.
First, we obtain indices of the first datasets from both series (we need internal indexing of datasets in the 'session' object for this task).
series1_first_dataset_index=session.dataset_index(series_name1,1); series2_first_dataset_index=session.dataset_index(series_name2,1);
To view internal indices for the datasets in any series use session.show_series_datasets(series_name1)
Link Ka
dataset1_parameter_index=3;
dataset2_parameter_index=3;
session.link_parameters(series1_first_dataset_index, dataset1_parameter_index,...
series2_first_dataset_index, dataset2_parameter_index);
Link k2
dataset1_parameter_index=4;
dataset2_parameter_index=4;
session.link_parameters(series1_first_dataset_index, dataset1_parameter_index,...
series2_first_dataset_index, dataset2_parameter_index);
Propagate information about links across all datasets
session.generate_fitting_environment(sprintf('%s.fitting_environment.txt', session_name))
Fitting environment saved in Tutorial_5.fitting_environment.txt. ans = New parameter relations (indexed as in all_parameters vector): [1] 1:Rtotal [2] 1:LRratio [3] 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A) [4] 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A [5] 1:w0_1 | 2:w0_1 | 3:w0_1 | 4:w0_1 | 5:w0_1 | 6:w0_1 | 7:w0_1 [6] 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2 [7] 1:FWHH_1 | 2:FWHH_1 | 3:FWHH_1 | 4:FWHH_1 | 5:FWHH_1 | 6:FWHH_1 | 7:FWHH_1 [8] 1:FWHH_2 | 2:FWHH_2 | 3:FWHH_2 | 4:FWHH_2 | 5:FWHH_2 | 6:FWHH_2 | 7:FWHH_2 [9] 1:ScaleFactor | 2:ScaleFactor | 3:ScaleFactor | 4:ScaleFactor | 5:ScaleFactor | 6:ScaleFactor | 7:ScaleFactor [10] 2:Rtotal [11] 2:LRratio [12] n/a [13] n/a [14] n/a [15] n/a [16] n/a [17] n/a [18] n/a [19] 3:Rtotal [20] 3:LRratio [21] n/a [22] n/a [23] n/a [24] n/a [25] n/a [26] n/a [27] n/a [28] 4:Rtotal [29] 4:LRratio [30] n/a [31] n/a [32] n/a [33] n/a [34] n/a [35] n/a [36] n/a [37] 5:Rtotal [38] 5:LRratio [39] n/a [40] n/a [41] n/a [42] n/a [43] n/a [44] n/a [45] n/a [46] 6:Rtotal [47] 6:LRratio [48] n/a [49] n/a [50] n/a [51] n/a [52] n/a [53] n/a [54] n/a [55] 7:Rtotal [56] 7:LRratio [57] n/a [58] n/a [59] n/a [60] n/a [61] n/a [62] n/a [63] n/a [64] 8:Rtotal [65] 8:LRratio [66] n/a [67] n/a [68] 8:w0_1 | 9:w0_1 | 10:w0_1 | 11:w0_1 | 12:w0_1 | 13:w0_1 | 14:w0_1 [69] 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2 [70] 8:FWHH_1 | 9:FWHH_1 | 10:FWHH_1 | 11:FWHH_1 | 12:FWHH_1 | 13:FWHH_1 | 14:FWHH_1 [71] 8:FWHH_2 | 9:FWHH_2 | 10:FWHH_2 | 11:FWHH_2 | 12:FWHH_2 | 13:FWHH_2 | 14:FWHH_2 [72] 8:ScaleFactor | 9:ScaleFactor | 10:ScaleFactor | 11:ScaleFactor | 12:ScaleFactor | 13:ScaleFactor | 14:ScaleFactor [73] 9:Rtotal [74] 9:LRratio [75] n/a [76] n/a [77] n/a [78] n/a [79] n/a [80] n/a [81] n/a [82] 10:Rtotal [83] 10:LRratio [84] n/a [85] n/a [86] n/a [87] n/a [88] n/a [89] n/a [90] n/a [91] 11:Rtotal [92] 11:LRratio [93] n/a [94] n/a [95] n/a [96] n/a [97] n/a [98] n/a [99] n/a [100] 12:Rtotal [101] 12:LRratio [102] n/a [103] n/a [104] n/a [105] n/a [106] n/a [107] n/a [108] n/a [109] 13:Rtotal [110] 13:LRratio [111] n/a [112] n/a [113] n/a [114] n/a [115] n/a [116] n/a [117] n/a [118] 14:Rtotal [119] 14:LRratio [120] n/a [121] n/a [122] n/a [123] n/a [124] n/a [125] n/a [126] n/a Summary of datasets with lookup vectors (position of each parameter in all_parameter vector) -------- Dataset 1/Series_1-1, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 1 2 3 4 5 6 7 8 9 Dataset 2/Series_1-2, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 10 11 3 4 5 6 7 8 9 Dataset 3/Series_1-3, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 19 20 3 4 5 6 7 8 9 Dataset 4/Series_1-4, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 28 29 3 4 5 6 7 8 9 Dataset 5/Series_1-5, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 37 38 3 4 5 6 7 8 9 Dataset 6/Series_1-6, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 46 47 3 4 5 6 7 8 9 Dataset 7/Series_1-7, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 55 56 3 4 5 6 7 8 9 Dataset 8/Series_2-1, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 64 65 3 4 68 69 70 71 72 Dataset 9/Series_2-2, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 73 74 3 4 68 69 70 71 72 Dataset 10/Series_2-3, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 82 83 3 4 68 69 70 71 72 Dataset 11/Series_2-4, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 91 92 3 4 68 69 70 71 72 Dataset 12/Series_2-5, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 100 101 3 4 68 69 70 71 72 Dataset 13/Series_2-6, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 109 110 3 4 68 69 70 71 72 Dataset 14/Series_2-7, model "U-model" Parameters : Rtotal LRratio log10(K_A) k_2_A w0_1 w0_2 FWHH_1 FWHH_2 ScaleFactor Lookup vector : 118 119 3 4 68 69 70 71 72
Now we have all parameters linked properly: Ka and k2 are global for all datasets, frequencies and line widths are global within each series but unlinked between series.
ASSIGN STARTING PARAMETERS
Define common parameter values NOTE: Technically, we do not need meaningful limits set for a simulation run. However, we will do fitting in the same TotalFit session therefore it is convenient to define limits now.
Binding constant
Log10_K_A=log10(1e6); Log10_K_A_AbsMin=log10(1e1); Log10_K_A_AbsMax=log10(1e10);
Dissociation rate constant
k_2_A=200; k_2_A_AbsMin=0.1; k_2_A_AbsMax=3000;
ScaleFactor
ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;
FAKE_VALUE=1e-8; % A 'place-holder' value to be substituted later
First series (slow exchange)
Frequency of R
w0_R=-300;
w0_R_AbsMin=X_min1; % set these limits to the spectral window
w0_R_AbsMax=X_max1;
Frequency of RL
w0_RL=800;
w0_RL_AbsMin=X_min1; % set these limits to the spectral window
w0_RL_AbsMax=X_max1;
Line width of R
FWHH_R=100; FWHH_R_AbsMin=10; FWHH_R_AbsMax=500;
Line width of RL
FWHH_RL=100; FWHH_RL_AbsMin=10; FWHH_RL_AbsMax=500;
ScaleFactor
ScaleFactor=1; ScaleFactor_AbsMin=0.1; ScaleFactor_AbsMax=10;
Create a vector of parameter values
simulation_parameters1=[ FAKE_VALUE FAKE_VALUE ... Log10_K_A k_2_A ... w0_R w0_RL FWHH_R FWHH_RL ScaleFactor ];
Second series
Frequency of R
w0_R=1700;
w0_R_AbsMin=X_min2; % set these limits to the spectral window
w0_R_AbsMax=X_max2;
Frequency of RL
w0_RL=1800;
w0_RL_AbsMin=X_min2; % set these limits to the spectral window
w0_RL_AbsMax=X_max2;
Line width of R
FWHH_R=100; FWHH_R_AbsMin=10; FWHH_R_AbsMax=500;
Line width of RL
FWHH_RL=100; FWHH_RL_AbsMin=10; FWHH_RL_AbsMax=500;
Create a vector of parameter values
simulation_parameters2=[ FAKE_VALUE FAKE_VALUE ... Log10_K_A k_2_A ... w0_R w0_RL FWHH_R FWHH_RL ScaleFactor ];
Assign linked parameters (I keep names of function arguments for information)
fake_limits=zeros(1,length(simulation_parameters1)); Monte_Carlo_range_min=fake_limits; Monte_Carlo_range_max=fake_limits; parameter_limits_min=fake_limits; parameter_limits_max=fake_limits; session.assign_parameter_values(series1_first_dataset_index, simulation_parameters1, ... Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max); session.assign_parameter_values(series2_first_dataset_index, simulation_parameters2, ... Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);
Assign UNlinked parameters
Rtotal_vector=ones(1,length(LRratio_array))*Rtotal; % this one is constant for all datasets
Create vectors of ranges for formal assignment (NOTE: they are different length from the "fake_limits" above).
unlinked_fake_ranges=zeros(1,number_of_datasets); MonteCarlo_range_min_array=unlinked_fake_ranges; MonteCarlo_range_max_array=unlinked_fake_ranges; parameter_limits_min_array=unlinked_fake_ranges; parameter_limits_max_array=unlinked_fake_ranges;
Assign Rtotal
parameter_number=1; values_array=Rtotal_vector; series_name=series_name1; session.assign_unlinked_parameter_in_series(... series_name, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array); series_name=series_name2; session.assign_unlinked_parameter_in_series(... series_name, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array);
Assign LRratio
parameter_number=2; values_array=LRratio_array; series_name=series_name1; session.assign_unlinked_parameter_in_series(... series_name, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array); series_name=series_name2; session.assign_unlinked_parameter_in_series(... series_name, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array);
CALCULATE MODEL VALUES
Run a simulation
X_standard_dev=0; % no need to have errors of Y
session.simulate_series(series_name1, X_standard_dev, Y_RMSD);
session.simulate_series(series_name2, X_standard_dev, Y_RMSD);
Plot the simulation result
figure_handle1=session.plot_1D_series(series_name1, 'Simulation 4', Y_offset_percent); figure_handle2=session.plot_1D_series(series_name2, 'Simulation 4', Y_offset_percent);
Nice 'experimentally-looking' result.
Save initial state of fitting session on a disk for future inspection and reuse
session.export_results(sprintf('%s.simulated_data', session_name), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.simulated_data/". Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.simulated_data/adjustable_parameters_only.txt". Statistics on per-dataset basis is written into "Tutorial_5.simulated_data/all_datasets.txt". Writing figures... Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written Done
SELECT AND ASSIGN STARTING PARAMETERS FOR FIT
Define parameter values (common first)
Binding constant
Log10_K_A=log10(1e7); Log10_K_A_AbsMin=log10(1e1); Log10_K_A_AbsMax=log10(1e10);
Dissociation rate constant
k_2_A=10; k_2_A_AbsMin=0.1; k_2_A_AbsMax=3000;
ScaleFactor
ScaleFactor=1;
ScaleFactor_AbsMin=0.1;
ScaleFactor_AbsMax=10;
FAKE_VALUE=1e-8; % A 'place-holder' value
First series (slow exchange)
Frequency of R
w0_R=-300; % is known exactly from spectral traces in slow exchange
w0_R_AbsMin=X_min1;
w0_R_AbsMax=X_max1;
Frequency of RL
w0_RL=800; % is known exactly from spectral traces in slow exchange
w0_RL_AbsMin=X_min1;
w0_RL_AbsMax=X_max1;
Line width of R
FWHH_R=100; % is known exactly from spectral traces in slow exchange
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;
Line width of RL
FWHH_RL=100; % well estimated from nearly-saturated spectral trace
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;
ScaleFactor
ScaleFactor=1; ScaleFactor_AbsMin=0.1; ScaleFactor_AbsMax=10;
Create a vector of parameter values and absolute limits
starting_parameters1=[ FAKE_VALUE FAKE_VALUE ... Log10_K_A k_2_A ... w0_R w0_RL FWHH_R FWHH_RL ScaleFactor ]; min_limits1=[ 0 0 Log10_K_A_AbsMin k_2_A_AbsMin w0_R_AbsMin w0_RL_AbsMin FWHH_R_AbsMin FWHH_RL_AbsMin ScaleFactor_AbsMin ]; max_limits1=[ 0 0 Log10_K_A_AbsMax k_2_A_AbsMax w0_R_AbsMax w0_RL_AbsMax FWHH_R_AbsMax FWHH_RL_AbsMax ScaleFactor_AbsMax ];
Mode for model parameters: fixed/variable = 0/1
parameter_modes1=[ 0 0 1 1 0 1 0 0 0];
Second series
Frequency of R
w0_R=1700; % is known exactly from spectral trace of free R
w0_R_AbsMin=X_min2;
w0_R_AbsMax=X_max2;
Frequency of RL
w0_RL=1750; % uncertain due to incomplete saturation of RL
w0_RL_AbsMin=X_min2;
w0_RL_AbsMax=X_max2;
Line width of R
FWHH_R=100; % is known exactly from spectral trace of free R
FWHH_R_AbsMin=10;
FWHH_R_AbsMax=500;
Line width of RL
FWHH_RL=100; % well estimated from a nearly-saturated spectral trace
FWHH_RL_AbsMin=10;
FWHH_RL_AbsMax=500;
Create a vector of parameter values and absolute limits
starting_parameters2=[ FAKE_VALUE FAKE_VALUE ... Log10_K_A k_2_A ... w0_R w0_RL FWHH_R FWHH_RL ScaleFactor ]; min_limits2=[ 0 0 Log10_K_A_AbsMin k_2_A_AbsMin w0_R_AbsMin w0_RL_AbsMin FWHH_R_AbsMin FWHH_RL_AbsMin ScaleFactor_AbsMin ]; max_limits2=[ 0 0 Log10_K_A_AbsMax k_2_A_AbsMax w0_R_AbsMax w0_RL_AbsMax FWHH_R_AbsMax FWHH_RL_AbsMax ScaleFactor_AbsMax ];
Mode for model parameters: fixed/variable = 0/1
parameter_modes2=[ 0 0 1 1 0 1 0 0 0];
Assign linked parameters
First series
first_dataset_index=session.dataset_index(series_name1,1);
Monte_Carlo_range_min=min_limits1;
Monte_Carlo_range_max=max_limits1;
parameter_limits_min=min_limits1;
parameter_limits_max=max_limits1;
session.assign_parameter_values(first_dataset_index, starting_parameters1, ...
Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);
Second series
first_dataset_index=session.dataset_index(series_name2,1);
Monte_Carlo_range_min=min_limits2;
Monte_Carlo_range_max=max_limits2;
parameter_limits_min=min_limits2;
parameter_limits_max=max_limits2;
session.assign_parameter_values(first_dataset_index, starting_parameters2, ...
Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max);
Assign unlinked parameters Create vectors of ranges for formal assignment because these are constant in fitting.
unlinked_fake_ranges=zeros(1,number_of_datasets);
Assign Rtotal --- this will not be fit so no need for ranges
parameter_number=1; values_array=Rtotal_vector; % this remains the same MonteCarlo_range_min_array=unlinked_fake_ranges; MonteCarlo_range_max_array=unlinked_fake_ranges; parameter_limits_min_array=unlinked_fake_ranges; parameter_limits_max_array=unlinked_fake_ranges; session.assign_unlinked_parameter_in_series(... series_name1, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array); session.assign_unlinked_parameter_in_series(... series_name2, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array);
Assign LRratio
parameter_number=2; values_array=LRratio_array; session.assign_unlinked_parameter_in_series(... series_name1, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array); session.assign_unlinked_parameter_in_series(... series_name2, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array);
Inspect initial guess
figure_handle1=session.plot_1D_series(series_name1, 'Initial', Y_offset_percent); figure_handle2=session.plot_1D_series(series_name2, 'Initial', Y_offset_percent);
Initial guess is appropriately bad to begin fitting.
Save initial state of fitting session on a disk for future inspection and reuse
session.export_results(sprintf('%s.initial_guess', session_name), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.initial_guess/". Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.initial_guess/adjustable_parameters_only.txt". Statistics on per-dataset basis is written into "Tutorial_5.initial_guess/all_datasets.txt". Writing figures... Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written Done
PERFORM FITTING
Set which parameters of the model will be fixed and variable
% first series first_dataset_index=session.dataset_index(series_name1,1); session.set_parameters_modes(first_dataset_index, parameter_modes1); % second series first_dataset_index=session.dataset_index(series_name2,1); session.set_parameters_modes(first_dataset_index, parameter_modes2);
Select fitting algorithm. Choices: 'simplex', 'MultiStart' 'GlobalSearch', 'DirectSearch'. See MATLAB documentation for their description and options.
Before starting this fitting: start parallel processing framework by issuing 'matlabpool open N' command, where N is the number of your CPUs. The IDAP algorithm will then farm Monte-Carlo runs on multiple CPUs. If you do not have multiple CPU---do not do anything and it will still work (slower). The simplest command to start your parallel processors: matlabpool open
fit_only_algorithm='GlobalSearch';
Report on start-up and show variable parameter names
fprintf(1,'\n\n==============\nFitting Algorithm: %s\nVariable parameters:', fit_only_algorithm);
session.fit_parameter_names{1:end}
============== Fitting Algorithm: GlobalSearch Variable parameters: ans = 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A) ans = 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A ans = 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2 ans = 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2
Fit without error determination
tic session.fit_only(fit_only_algorithm); toc
GlobalSearch stopped because it analyzed all the trial points. All 108 local solver runs converged with a positive local solver exit flag. Elapsed time is 94.369409 seconds.
Make sure you read output of the fitting routine concerning convergence (or lack of it)!
Report
session.show_fit_statistics() session.transfer_all_parameters_to_individual_datasets();
ans = Fitting results for "Tutorial_5" ========================== Fitting method: GlobalSearch Number of active datasets (non-zero priority): 14 Reporting only on variable parameters (dataset-#:parameter-name): (# in the fit-vector) [min value, max value] for 95% confidence interval -------------------------- (1) name: " 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)" (1) value= 4.976519e+00 [ 0.000000e+00 0.000000e+00 ] (2) name: " 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A" (2) value= 2.242349e+02 [ 0.000000e+00 0.000000e+00 ] (3) name: " 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2" (3) value= 8.561573e+02 [ 0.000000e+00 0.000000e+00 ] (4) name: " 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2" (4) value= 1.804079e+03 [ 0.000000e+00 0.000000e+00 ] Total sum of squares (SS) = 1.07119e-05 Statistics per dataset: ------------ (1) "Series_1-1": Priority=1, R^2=0.977, 4.7% of total SS, Model= "U-model" (2) "Series_1-2": Priority=1, R^2=0.908, 5.5% of total SS, Model= "U-model" (3) "Series_1-3": Priority=1, R^2=0.798, 3.9% of total SS, Model= "U-model" (4) "Series_1-4": Priority=1, R^2=0.672, 4.9% of total SS, Model= "U-model" (5) "Series_1-5": Priority=1, R^2=0.894, 4.6% of total SS, Model= "U-model" (6) "Series_1-6": Priority=1, R^2=0.866, 25% of total SS, Model= "U-model" (7) "Series_1-7": Priority=1, R^2=0.920, 15% of total SS, Model= "U-model" (8) "Series_2-1": Priority=1, R^2=0.993, 4.3% of total SS, Model= "U-model" (9) "Series_2-2": Priority=1, R^2=0.991, 4.9% of total SS, Model= "U-model" (10) "Series_2-3": Priority=1, R^2=0.980, 9.9% of total SS, Model= "U-model" (11) "Series_2-4": Priority=1, R^2=0.993, 3.3% of total SS, Model= "U-model" (12) "Series_2-5": Priority=1, R^2=0.996, 2.3% of total SS, Model= "U-model" (13) "Series_2-6": Priority=1, R^2=0.989, 6.9% of total SS, Model= "U-model" (14) "Series_2-7": Priority=1, R^2=0.993, 4.4% of total SS, Model= "U-model"
Inspect final result
figure_handle1=session.plot_1D_series(series_name1, 'Final', Y_offset_percent); figure_handle2=session.plot_1D_series(series_name2, 'Final', Y_offset_percent);
Export final result
session.export_results(sprintf('%s.Fit_Only.%s', session_name, fit_only_algorithm), {figure_handle1 figure_handle2})
Output current session state, statistics and graphs into "Tutorial_5.Fit_Only.GlobalSearch/". Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.Fit_Only.GlobalSearch/adjustable_parameters_only.txt". Statistics on per-dataset basis is written into "Tutorial_5.Fit_Only.GlobalSearch/all_datasets.txt". Writing figures... Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written Done
DETERMINE CONFIDENCE INTERVALS USING MONTE-CARLO APPROACH
IMPORTANT NOTE: Confidence intervals are ONLY meaningful if the model fits data well that is a curve goes trough the points! Confidence intervals and parameter correlation plots then show how variation of the curves passing WITHIN errorbars may be accounted for by variation in parameters. If the curve DOES NOT pass through the error bars---Monte-Carlo run only wastes your time because the randomizer varies the synthetic data AROUND THE IDEAL SIMULATED CURVE. However, if this curve IS NOT GOING THROUGH the experimental data to begin with---entire Monte-Carlo run is meaningless because its random variation does reproduce deviation of the data from the model curve.
NOTE: Starting parameter values are the ones that resulted from previous fit.
fit_algorithm='simplex'; % fastest robust algorithm % fit_algorithm='GlobalSearch'; % would be too slow
Report on start-up and show variable parameter names
fprintf(1,'\n\n==============\nMonte-Carlo Fitting Algorithm: %s\nVariable parameters:', fit_algorithm);
session.fit_parameter_names{1:end}
============== Monte-Carlo Fitting Algorithm: simplex Variable parameters: ans = 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A) ans = 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A ans = 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2 ans = 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2
Start fitting
session.fit(fit_algorithm, 'single_cluster');
session.show_fit_statistics()
session.transfer_all_parameters_to_individual_datasets();
Starting fitting session using simplex algorithm on a single cluster. Working folder: "single_cluster_fit" Variable parameter: Columns 1 through 3 [1x214 char] [1x144 char] [1x62 char] Column 4 [1x67 char] Total dataset is saved into "single_cluster_fit". Checking on "single_cluster_fit": 10 trials. Checking on "single_cluster_fit": 20 trials. Checking on "single_cluster_fit": 30 trials. Checking on "single_cluster_fit": 40 trials. Checking on "single_cluster_fit": 50 trials. Using ALL 50 fitting runs ignoring the exit flags indicating that 0 runs failed to converge). Checking on "single_cluster_fit": 60 trials. Using ALL 60 fitting runs ignoring the exit flags indicating that 0 runs failed to converge). Checking on "single_cluster_fit": 70 trials. Using ALL 70 fitting runs ignoring the exit flags indicating that 0 runs failed to converge). Elapsed time is 147.651146 seconds. ans = Fitting results for "Tutorial_5" ========================== Fitting method: simplex Number of active datasets (non-zero priority): 14 Reporting only on variable parameters (dataset-#:parameter-name): (# in the fit-vector) [min value, max value] for 95% confidence interval -------------------------- (1) name: " 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A)" (1) value= 6.007956e+00 [ 5.884858e+00 6.160755e+00 ] (2) name: " 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A" (2) value= 1.928793e+02 [ 1.753941e+02 2.069492e+02 ] (3) name: " 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2" (3) value= 8.002024e+02 [ 7.976611e+02 8.034264e+02 ] (4) name: " 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2" (4) value= 1.799248e+03 [ 1.798598e+03 1.800418e+03 ] Total sum of squares (SS) = 6.80906e-06 Statistics per dataset: ------------ (1) "Series_1-1": Priority=1, R^2=0.977, 7.4% of total SS, Model= "U-model" (2) "Series_1-2": Priority=1, R^2=0.913, 8.2% of total SS, Model= "U-model" (3) "Series_1-3": Priority=1, R^2=0.813, 5.6% of total SS, Model= "U-model" (4) "Series_1-4": Priority=1, R^2=0.684, 7.4% of total SS, Model= "U-model" (5) "Series_1-5": Priority=1, R^2=0.905, 6.4% of total SS, Model= "U-model" (6) "Series_1-6": Priority=1, R^2=0.980, 5.8% of total SS, Model= "U-model" (7) "Series_1-7": Priority=1, R^2=0.975, 7.7% of total SS, Model= "U-model" (8) "Series_2-1": Priority=1, R^2=0.993, 6.8% of total SS, Model= "U-model" (9) "Series_2-2": Priority=1, R^2=0.992, 6.9% of total SS, Model= "U-model" (10) "Series_2-3": Priority=1, R^2=0.980, 16% of total SS, Model= "U-model" (11) "Series_2-4": Priority=1, R^2=0.994, 4.6% of total SS, Model= "U-model" (12) "Series_2-5": Priority=1, R^2=0.996, 3.3% of total SS, Model= "U-model" (13) "Series_2-6": Priority=1, R^2=0.992, 7.9% of total SS, Model= "U-model" (14) "Series_2-7": Priority=1, R^2=0.993, 6.5% of total SS, Model= "U-model"
SAVE AND PLOT DATA IN MEMORY-CONSERVING WAY
Key principle is to plot a few figures, save them and close them. MATLAB can handle about 25 figures, then slows down and at about 40-70 locks up!
*Create place for storage*
project_path=sprintf('%s.Fit_Monte_Carlo.%s', session_name, fit_algorithm);
session.export_results(project_path, {});
Output current session state, statistics and graphs into "Tutorial_5.Fit_Monte_Carlo.simplex/". Adjustable parameters and cumulative dataset statistics is written into "Tutorial_5.Fit_Monte_Carlo.simplex/adjustable_parameters_only.txt". Statistics on per-dataset basis is written into "Tutorial_5.Fit_Monte_Carlo.simplex/all_datasets.txt". Writing figures... Done
Inspect and save final results
figure_handle1=session.plot_1D_series(series_name1, 'Monte Carlo', Y_offset_percent); figure_handle2=session.plot_1D_series(series_name2, 'Monte Carlo', Y_offset_percent);
session.save_figures(project_path, 'Fits', {figure_handle1 figure_handle2}); close all
Output current graphics for "Fits" into "Tutorial_5.Fit_Monte_Carlo.simplex/Fits/". Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written
BUILD HISTOGRAMS OF DISTRIBUTIONS OF FITTING PARAMETERS
Histograms are plotted using full distribution of parameters from fitting of synthetic noisy datasets. The bottom panel of histogram graphs gives normalized sum of squares for all runs (red dot - the sum of squares from the best-fit to experimental data).
NOTE: If fitting parameters ended up at or beyond parameter limits they are thrown out of these distributions
NOTE 2: Scaling of these graphs is done using min/max limits of the parameters. This may be sub-optimal for viewing and analysis. If you need to adjust a graph---load FIG file from the folder with graphic output and edit in MATLAB.
NOTE 3: The previous note has to be generalized: IDAP has not been optimized for 'pretty' graphic output because any graphs may be adjusted using standard MATLAB tools. User is invited to modify the code to his/her liking to satisfy their taste.
histogram_bins=20;
To plot histograms for selected parameters use the following: NOTE: the index is NOT from the model but from the vector of variable parameters. variable_parameters_indices=[1 2]; % All parameters you want to see parameter_histograms_handles=session.plot_specified_histograms(variable_parameters_indices, histogram_bins);
To automatically plot histograms for ALL variable parameters use:
parameter_histograms_handles=session.plot_all_histograms(histogram_bins);
save
session.save_figures(project_path, 'Histograms', parameter_histograms_handles); close all
Output current graphics for "Histograms" into "Tutorial_5.Fit_Monte_Carlo.simplex/Histograms/". Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 3: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 4: small PNG image, large PNG image, EPS image, MATLAB figure file - written
PLOT PARAMETER CORRELATIONS
When fitting with complex models we observe that parameters of the model may be correlated. It means that for a given signal/noise ratio in the data it is possible to compensate variation in one parameter by corresponding adjustment of the correlated parameter. This correlation happens, in the first place, when models are overparameterized that is have more parameters than necessary to describe the data. Second situation is when you have an adequate model but increased noise in the data 'hides' specific data feature thus making parameters less defined. It is important to know about parameter correlation in the model to understand chances of fitting out accurate values of these parameters.
First, you specify pairs of parameters you want to see correlations. To view parameter number in the vector of variable parameters issue:
session.fit_parameter_names{1:end}
ans = 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) | 9:log10(K_A) | 10:log10(K_A) | 11:log10(K_A) | 12:log10(K_A) | 13:log10(K_A) | 14:log10(K_A) ans = 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A | 9:k_2_A | 10:k_2_A | 11:k_2_A | 12:k_2_A | 13:k_2_A | 14:k_2_A ans = 1:w0_2 | 2:w0_2 | 3:w0_2 | 4:w0_2 | 5:w0_2 | 6:w0_2 | 7:w0_2 ans = 8:w0_2 | 9:w0_2 | 10:w0_2 | 11:w0_2 | 12:w0_2 | 13:w0_2 | 14:w0_2
Choose parameter pairs to view
parameter_pairs=[ 1 2 1 3 2 3 1 4 2 4 3 4 ];
Plot these specific pairs.
figure_handles = session.plot_specified_correlations(parameter_pairs);
Alternatively, you may plot all of them but that may be massive number of graphs therefore is not advisable for multiple datasets. Yet, if you want to do it use: session.plot_all_correlations()
save
session.save_figures(project_path, 'Correlations', figure_handles); close all
Output current graphics for "Correlations" into "Tutorial_5.Fit_Monte_Carlo.simplex/Correlations/". Figure 1: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 2: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 3: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 4: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 5: small PNG image, large PNG image, EPS image, MATLAB figure file - written Figure 6: small PNG image, large PNG image, EPS image, MATLAB figure file - written
IMPORTANT NOTE ON DATA SAVED FROM FITTING SESSIONS
The output folder contains current statistics of the data analysis session along with some graphs PNG, EPS and FIG formats. The current state of IDAP fitting session is saved in session.mat. You may load it back in MATLAB using load() command and then examine any graphs, make any changes and continue fitting your data further.