# Tutorial 6. Testing a new model

by Evgenii Kovrigin, 05/20/2011

IDAP is built to allow easy introduction of new mathematical models. Therefore, the models are numerous and have many versions for all imaginable purposes. A good practice in using new models is to assume that they contain both mathematical and programmatic glitches thus it is imperative to test their operation before using in data fitting. It is straightforward to test whether the model indeed works as expected by inspecting simulated signals.

The key concept is that, irrespective of the model inner workings, it describes a physical process that may be understood qualitatively without any calculations. What we need to do is to propose some set of conditions where we can qualitatively (or semi-quantitatively) predict the expected signal. If the model reasonably reproduces these test cases, chances are that it works well. Mathematical problems (incorrect assumptions, derivations, etc.) will manifest themselves in deviation of the modelled signal from expected features. Programmatic glitches result in a loss of common-sense behavior of the model when parameters are systematically changed.

NOTE: To ensure my own models are correctly derived I perform all derivations using MuPad (symbolic algebra package of MATLAB), which allows to document derivations very efficiently. The user is invited to examine and rerun these notebooks located in Mathematical_models/ folder.

This tutorial is an example of a common-sense-based testing of a standard model I am introducing in IDAP: U-R2, ligand binding coupled to dimerization of the receptor such that dimers do not bind the ligand. When I introduce other complex models I will place the testing documents in their respective folders in Mathematical_models/.

## Contents

- Model derivation
- Test Computation of Equilibrium Concentrations
- Test of NMR line shape calculations
- Create NMR line shape dataset
- Slow exchange in free receptor
- Dilution of free receptor
- Fast exchange in free receptor
- A fully saturated receptor
- Summary of individual plots
- Create a titration series
- Conclusions

## Model derivation

We are not going through steps of model development here---only review them briefly and go to testing. The main steps in the model development are:

1. Develop mathematical equations to describe evolution of molecular species. The result is either analytical (closed-form) solution or an expression for solving numerically.

2. Create the model function in either code/+equilibrium_thermodynamic_equations or code/+differential_kinetic_equations packages.

3. Develop any additional math (such as kinetic matrices for NMR line shapes) to calculate expected experimental signal.

4. Create a model function that calculates signal using knowledge of molecular composition computed above. This is the function that will be called by a fitting routine to optimize parameters with respect to experimental data.

4. Add reference to this model in a data-specific subclass of the Dataset superclass.

After all this, we are ready to invoke the new model and see it in action.

close all clear all

Here you will find all figures

```
figures_folder='U_R2_testing_figures';
```

## Test Computation of Equilibrium Concentrations

Set some meaningful parameters

Rtotal=1e-3; % Receptor concentration, M LRratio_array=[0 : 0.02 : 1.45]; % Array of L/R K_a_A=1e6; % Binding affinity constant K_a_B=1e4; % Dimerization constant % Set appropriate options for the model (see model file for details) model_numeric_solver='fminbnd' ; model_numeric_options=optimset('Diagnostics','off', ... 'Display','off',... 'TolX',1e-9,... 'MaxFunEvals', 1e9);

Important option here is TolX that sets termination tolerance on free ligand concentation in molar units. With our solution concentrations in 1e-3 range TolX should be set to some 1e-9.

**Compute arrays for populations and plot**

concentrations_array=[]; for counter=1:length(LRratio_array) % compute [concentrations species_names] = equilibrium_thermodynamic_equations.U_R2_model(... Rtotal, LRratio_array(counter), K_a_A, K_a_B,... model_numeric_solver, model_numeric_options); % collect concentrations_array = [concentrations_array ; concentrations]; end

Plot

Figure_title= 'U-R2 model'; X_range=[0 max(LRratio_array)+0.1 ]; % extend X just a bit past last point Y_range=[ ]; % keep automatic scaling for Y % display figure figure_handle=equilibrium_thermodynamic_equations.plot_populations(... LRratio_array, concentrations_array, species_names, Figure_title, X_range, Y_range); % save it results_output.output_figure(figure_handle, figures_folder, 'Concentrations_plot');

**Observations** The result is exactly what we expect from this model. In absence of a ligand the receptor exists in equilibrium between monomer, R, and a dimer, R2. Since one molecule of dimer is made of two monomers, the total concentration of a dimer has to be doubled to see concentration of the monomers in the dimer form. A sum of monomer concentration in free and dimer form is equal to Rtotal we set.

As ligand is added, the RL species is formed, leading to reduction of total concentration of the receptor. This reduction leads shifting of R<=>R2 equilibrium towards monomeric species such that above molar ratio L/R=0.8 the monomer becomes a major species in solution. This is the same effect as dilution of the receptor solution: progressive depletion of R2 population.

Once the receptor approaches saturation, the RL concentration approaches the Rtotal set above. At the same time, the free ligand begins to accumulate linearly with L/R molar ratio.

**Conclusion**

The model ** equilibrium_thermodynamic_equations.U_R2_model()** works well.

## Test of NMR line shape calculations

We will generate line shapes for in conditions where we can expect specific patterns. We will plot a spectrum at a specific solution conditions (we will not use Series of datasets for simplicity).

1. Ligand-free receptor in slow exchange should show two separate peaks with areas proportional to their equilibrium concentrations. This ratio should change upon dilution in favor of the monomer. NOTE: NMR intensity is proportional to concentrations of NMR-active spins so it will track concentration of monomers in monomeric and dimeric structures.

2. Fast exchange in the same setting should reveal population-weighted average peak. Here, again, the concentration of monomers in free and dimeric form will be important---not the concentration of dimeric species. Similarly, dilution would lead to a shift of a peak towards frequency of monomeric species.

3. Addition of saturating concentration of ligand should give us a single peak at the RL frequency.

## Create NMR line shape dataset

Create data object for 1D NMR line shapes

test1=NMRLineShapes1D('Simulation','Test_1'); test1.set_active_model('U_R2-model', model_numeric_solver, model_numeric_options) test1.show_active_model() % to look up necessary parameters of the model

ans = Active model: Model 7: "U_R2-model" Model description "1D NMR line shape for the U-R2 model, no temperature/field dependence, fminbnd solver" Model handle: line_shape_equations_1D.U_R2_model_1D Current solver: fminbnd Model parameters: 1: Rtotal 2: LRratio 3: log10(K_A) 4: log10(K_B) 5: k_2_A 6: k_2_B 7: w0_R 8: w0_R2 9: w0_RL 10: FWHH_R 11: FWHH_R2 12: FWHH_RL 13: ScaleFactor

Set range for X to extend beyond resonances

```
w_min= -300;
w_max= 500;
datapoints=100; % Does not matter much because smooth curve is anyway calculated
test1.set_X(linspace(w_min, w_max, datapoints));
```

**Set fixed plotting ranges for easier comparison** If we do not set ranges they will be chosen automatically for each graph

test1.X_range=[w_min w_max]; % this sets display range in the plots test1.Y_range=[0 8e-5]; % this sets display range in the plots

Calculate ideal data: set noise RMSD to 0 for both X and Y

X_RMSD=0; Y_RMSD=0;

## Slow exchange in free receptor

Use the same thermodynamic parameters as above. Add spectral and kinetic parameters:

LRratio = 0.01 ; % NOTE: for a numeric model this value has to be >=0.01 Log10_K_A = log10(K_a_A); Log10_K_B = log10(K_a_B); k_2_A = 1; % dissociation rate constant of the complex, 1/s k_2_B = 1; % dissociation rate constant of the receptor dimer, 1/s w0_R = -200; % NMR frequency of the free receptor R, 1/s w0_R2 = 200; % NMR frequency of the dimer R2, 1/s w0_RL = 400; % NMR frequency of the bound complex RL, 1/s FWHH_R = 20; % line width at half height of the peak of R, 1/s FWHH_R2 = 20; % line width at half height of the peak of R2, 1/s FWHH_RL = 20; % line width at half height of the peak of RL, 1/s ScaleFactor = 1; % a multiplier for spectral amplitude (used only when fitting data) % compute equilbrium concentrations [concentrations species_names] = equilibrium_thermodynamic_equations.U_R2_model(... Rtotal, LRratio, K_a_A, K_a_B,... model_numeric_solver, model_numeric_options) % plot line shapes parameters=[ Rtotal LRratio Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; test1.simulate_noisy_data(parameters, X_RMSD, Y_RMSD); figure_handle=test1.plot_simulation('Slow: free R'); results_output.output_figure(figure_handle, figures_folder, 'Slow_concentrated_R');

concentrations = 1.0e-03 * 0.1993 0.3973 0.0100 0.0000 species_names = 'Req' 'R2eq' 'RLeq' 'Leq'

Expected ratio of R2 peak to R peak is 4:1 corresponding to number of spins experiencing dimeric and monomeric environment.

**Prepare TotalFit session for easy display of many graphs**

storage_session=TotalFit('All_tests'); % set aside for plotting later storage_session.copy_dataset_into_array(test1);

## Dilution of free receptor

Rtotal=1e-4; % compute equilbrium concentrations [concentrations species_names] = equilibrium_thermodynamic_equations.U_R2_model(... Rtotal, LRratio, K_a_A, K_a_B,... model_numeric_solver, model_numeric_options) parameters=[ Rtotal LRratio Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; test1.simulate_noisy_data(parameters, X_RMSD, Y_RMSD); figure_handle=test1.plot_simulation('Slow: 1/10 R'); results_output.output_figure(figure_handle, figures_folder, 'Slow_diluted_R'); % set aside for plotting later storage_session.copy_dataset_into_array(test1);

concentrations = 1.0e-04 * 0.4965 0.2466 0.0098 0.0002 species_names = 'Req' 'R2eq' 'RLeq' 'Leq'

When receptor is diluted we see relative reduction in population of R2 with respect to R: peak areas are almost equal now reflecting similar number of spins in each environment (total concentration of spins, obviously, dropped due to dilution).

## Fast exchange in free receptor

**Diluted**

k_2_B=1000; parameters=[ Rtotal LRratio Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; test1.simulate_noisy_data(parameters, X_RMSD, Y_RMSD); figure_handle=test1.plot_simulation('Fast: 1/10 R'); results_output.output_figure(figure_handle, figures_folder, 'Fast_diluted_R'); % set aside for plotting later storage_session.copy_dataset_into_array(test1);

The weighted average peak is almost in the middle with a very slight shift towards the monomer, because it has very slightly more spins than dimer in these conditions

**Concentrated:** Return to the original condition

Rtotal=1e-3; % compute equilbrium concentrations [concentrations species_names] = equilibrium_thermodynamic_equations.U_R2_model(... Rtotal, LRratio, K_a_A, K_a_B,... model_numeric_solver, model_numeric_options) parameters=[ Rtotal LRratio Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; test1.simulate_noisy_data(parameters, X_RMSD, Y_RMSD); figure_handle=test1.plot_simulation('Fast: 1/1 R'); results_output.output_figure(figure_handle, figures_folder, 'Fast_concentrated_R'); % set aside for plotting later storage_session.copy_dataset_into_array(test1);

concentrations = 1.0e-03 * 0.1993 0.3973 0.0100 0.0000 species_names = 'Req' 'R2eq' 'RLeq' 'Leq'

The peak is roughly at 4:1 weighted position in favor of a dimer

## A fully saturated receptor

LRratio = 10 ; parameters=[ Rtotal LRratio Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; test1.simulate_noisy_data(parameters, X_RMSD, Y_RMSD); figure_handle=test1.plot_simulation('saturated RL'); results_output.output_figure(figure_handle, figures_folder, 'Saturated_RL'); % set aside for plotting later storage_session.copy_dataset_into_array(test1);

We observe a peak at a frequency of bound form

## Summary of individual plots

storage_session.define_series('All-tests',1:length(storage_session.dataset_array)); figure_handle=storage_session.plot_1D_series('All-tests', 'All tests', 0); results_output.output_figure(figure_handle, figures_folder, 'All_tests');

This is a comparison graph of all previous plots

## Create a titration series

(see Tutorial 3 for details) Choose parameters

ZERO_LRratio= 0.01; % This number cannot be too small---the model is numeric and will NOT converge! LRratio_vector=[ZERO_LRratio 0.2 0.4 0.6 0.8 1.0 1.2 1.4 ]; % choose the same parameters as above session_name='Series'; session=TotalFit(session_name); series_name='U-R2_simulation'; dataset_class_name='NMRLineShapes1D'; model_name='U_R2-model'; number_of_datasets=length(LRratio_vector); X_column=test1.X; % use the same range % create TotalFit session session.create_simulation_series(series_name, dataset_class_name, model_name, ... model_numeric_solver, model_numeric_options,... number_of_datasets, X_column); session.initialize_relation_matrix(sprintf('%s.all_datasets.txt', session_name)); % link parameters parameter_number_array=[ 1 3 4 5 6 7 8 9 10 11 12 13]; session.batch_link_parameters_in_series(series_name, parameter_number_array); session.generate_fitting_environment(sprintf('%s.fitting_environment.txt', session_name)) % prepare parameters first_dataset_index=session.dataset_index(series_name,1); FAKE_VALUE=1E-6; % individual (unlinked) parameters: to be assigned later ideal_parameters=[ Rtotal FAKE_VALUE Log10_K_A Log10_K_B k_2_A k_2_B ... w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor ]; % prepare formal limits fake_limits=zeros(1,length(ideal_parameters)); Monte_Carlo_range_min=fake_limits; Monte_Carlo_range_max=fake_limits; parameter_limits_min=fake_limits; parameter_limits_max=fake_limits; % assign session.assign_parameter_values(first_dataset_index, ideal_parameters, ... Monte_Carlo_range_min, Monte_Carlo_range_max, parameter_limits_min, parameter_limits_max); % Assign unlinked LRratio and ranges parameter_number=2; values_array=LRratio_vector; unlinked_fake_ranges=zeros(1,number_of_datasets); MonteCarlo_range_min_array=unlinked_fake_ranges; MonteCarlo_range_max_array=unlinked_fake_ranges; parameter_limits_min_array=unlinked_fake_ranges; parameter_limits_max_array=unlinked_fake_ranges; session.assign_unlinked_parameter_in_series(... series_name, parameter_number, values_array, ... MonteCarlo_range_min_array, MonteCarlo_range_max_array, ... parameter_limits_min_array, parameter_limits_max_array); % Simulate series X_standard_dev=0; Y_standard_dev=1e-8; % something really small session.simulate_series(series_name, X_standard_dev, Y_standard_dev); % Plot Y_offset_percent=10; % to offset figures vertically for easier viewing % set plotting ranges X_range=test1.X_range; % use old Y_range=[]; % automatic session.set_XY_ranges_series(series_name, X_range, Y_range) figure_handle=session.plot_1D_series(series_name, session.name, Y_offset_percent); results_output.output_figure(figure_handle, figures_folder, 'Series');

Dataset listing is written to "Series.all_datasets.txt". Fitting environment saved in Series.fitting_environment.txt. ans = New parameter relations (indexed as in all_parameters vector): [1] 1:Rtotal | 2:Rtotal | 3:Rtotal | 4:Rtotal | 5:Rtotal | 6:Rtotal | 7:Rtotal | 8:Rtotal [2] 1:LRratio [3] 1:log10(K_A) | 2:log10(K_A) | 3:log10(K_A) | 4:log10(K_A) | 5:log10(K_A) | 6:log10(K_A) | 7:log10(K_A) | 8:log10(K_A) [4] 1:log10(K_B) | 2:log10(K_B) | 3:log10(K_B) | 4:log10(K_B) | 5:log10(K_B) | 6:log10(K_B) | 7:log10(K_B) | 8:log10(K_B) [5] 1:k_2_A | 2:k_2_A | 3:k_2_A | 4:k_2_A | 5:k_2_A | 6:k_2_A | 7:k_2_A | 8:k_2_A [6] 1:k_2_B | 2:k_2_B | 3:k_2_B | 4:k_2_B | 5:k_2_B | 6:k_2_B | 7:k_2_B | 8:k_2_B [7] 1:w0_R | 2:w0_R | 3:w0_R | 4:w0_R | 5:w0_R | 6:w0_R | 7:w0_R | 8:w0_R [8] 1:w0_R2 | 2:w0_R2 | 3:w0_R2 | 4:w0_R2 | 5:w0_R2 | 6:w0_R2 | 7:w0_R2 | 8:w0_R2 [9] 1:w0_RL | 2:w0_RL | 3:w0_RL | 4:w0_RL | 5:w0_RL | 6:w0_RL | 7:w0_RL | 8:w0_RL [10] 1:FWHH_R | 2:FWHH_R | 3:FWHH_R | 4:FWHH_R | 5:FWHH_R | 6:FWHH_R | 7:FWHH_R | 8:FWHH_R [11] 1:FWHH_R2 | 2:FWHH_R2 | 3:FWHH_R2 | 4:FWHH_R2 | 5:FWHH_R2 | 6:FWHH_R2 | 7:FWHH_R2 | 8:FWHH_R2 [12] 1:FWHH_RL | 2:FWHH_RL | 3:FWHH_RL | 4:FWHH_RL | 5:FWHH_RL | 6:FWHH_RL | 7:FWHH_RL | 8:FWHH_RL [13] 1:ScaleFactor | 2:ScaleFactor | 3:ScaleFactor | 4:ScaleFactor | 5:ScaleFactor | 6:ScaleFactor | 7:ScaleFactor | 8:ScaleFactor [14] n/a [15] 2:LRratio [16] n/a [17] n/a [18] n/a [19] n/a [20] n/a [21] n/a [22] n/a [23] n/a [24] n/a [25] n/a [26] n/a [27] n/a [28] 3:LRratio [29] n/a [30] n/a [31] n/a [32] n/a [33] n/a [34] n/a [35] n/a [36] n/a [37] n/a [38] n/a [39] n/a [40] n/a [41] 4:LRratio [42] n/a [43] n/a [44] n/a [45] n/a [46] n/a [47] n/a [48] n/a [49] n/a [50] n/a [51] n/a [52] n/a [53] n/a [54] 5:LRratio [55] n/a [56] n/a [57] n/a [58] n/a [59] n/a [60] n/a [61] n/a [62] n/a [63] n/a [64] n/a [65] n/a [66] n/a [67] 6:LRratio [68] n/a [69] n/a [70] n/a [71] n/a [72] n/a [73] n/a [74] n/a [75] n/a [76] n/a [77] n/a [78] n/a [79] n/a [80] 7:LRratio [81] n/a [82] n/a [83] n/a [84] n/a [85] n/a [86] n/a [87] n/a [88] n/a [89] n/a [90] n/a [91] n/a [92] n/a [93] 8:LRratio [94] n/a [95] n/a [96] n/a [97] n/a [98] n/a [99] n/a [100] n/a [101] n/a [102] n/a [103] n/a [104] n/a Summary of datasets with lookup vectors (position of each parameter in all_parameter vector) -------- Dataset 1/U-R2_simulation-1, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 2 3 4 5 6 7 8 9 10 11 12 13 Dataset 2/U-R2_simulation-2, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 15 3 4 5 6 7 8 9 10 11 12 13 Dataset 3/U-R2_simulation-3, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 28 3 4 5 6 7 8 9 10 11 12 13 Dataset 4/U-R2_simulation-4, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 41 3 4 5 6 7 8 9 10 11 12 13 Dataset 5/U-R2_simulation-5, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 54 3 4 5 6 7 8 9 10 11 12 13 Dataset 6/U-R2_simulation-6, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 67 3 4 5 6 7 8 9 10 11 12 13 Dataset 7/U-R2_simulation-7, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 80 3 4 5 6 7 8 9 10 11 12 13 Dataset 8/U-R2_simulation-8, model "U_R2-model" Parameters : Rtotal LRratio log10(K_A) log10(K_B) k_2_A k_2_B w0_R w0_R2 w0_RL FWHH_R FWHH_R2 FWHH_RL ScaleFactor Lookup vector : 1 93 3 4 5 6 7 8 9 10 11 12 13

This is an expected result. Binding in slow exchange leads to appearance of RL resonance, which gradually increases in intensity. Fast exchange between monomer and a dimer leads to a single population-weighted average peak that shifts towards monomer resonance frequency upon titration (as concentration of unliganded R decreases).

## Conclusions

The U-R2 model for NMR 1D line shapes works as expected.