Statistics
Standard Statistics¶
Standard statistics will be performed according to the Goodness of Fit
, which is implemented by default through lmfit
. The Goodness of Fit
contains:
- \(N_{vars}\): the number of degrees of variables
- \(N_{free}\): the number of degrees of freedom
- \(N_{func}\): the number of degrees of functions
- \(N_{data}\): the number of data points
- \(residuals\): residuals
- \(\chi^2\): the sum of the squares of the residuals
- \(\textbf{red}\chi^2\): the reduced \(\chi^2\)
- \(\textnormal{AIC}\): the Akaike Information Criterion
- \(\textnormal{BIC}\): the Bayesian Information Criterion
- \(covariance\): the Covariance Matrix (optional)
According to the provided data below, two types of fits will be compared according to the Goodness of fit
. In example 1, five Gaussian-distributions are used to fit the data; in example 2, three Gaussian-distributions are used to fit the data. The fitted spectra are shown in the figures below.
Reference Data
Energy,Intensity
291.80443675550987,0.11771793109307316
291.5690317863643,0.12724843410900621
291.3122900238649,0.14061739741579737
291.0766791011155,0.14053244155422917
290.7770989890252,0.15388595834073504
290.52056318012956,0.17687038052502746
290.3283260863409,0.2018010642652478
290.1146698177568,0.22672402474532583
289.81599590152325,0.2823855605928377
289.5602015256013,0.33998563473613497
289.389383606608,0.36492404173649806
289.0898446852385,0.38020065029850414
288.70417596675907,0.374292356289437
288.48973707448056,0.3626765730350098
288.2963878312313,0.33568377883672285
288.14703027775414,0.36255300087272857
287.8477385007092,0.38936816008773656
287.63535914446874,0.4739069656083228
287.3586398824005,0.5545764177974809
287.0818382388907,0.6313996864356385
286.8691293568841,0.7005537577522226
286.5914627082384,0.7369920991048744
286.35630488341735,0.7580611527738091
286.1211058678756,0.7772071146672436
285.8225143330836,0.8367148340657561
285.39483107942857,0.8692529290464093
285.0089152166246,0.8518060843843406
284.8785877761387,0.767143706701473
284.59750229766985,0.6439654306876021
284.42277126020434,0.486210119015439
284.14193292606006,0.3745703936545697
283.83942827279583,0.25138439438055615
283.6027875820274,0.2032221441314811
283.34472771646364,0.15505217062226317
283.0229046151628,0.12993612863862114
282.4226322415218,0.10471968427312561
281.88711168091606,0.10260351099406106
281.2445776277749,0.10429490496528415
280.6662187175784,0.10216328516593454
Example 1: five peaks
[fitting.peaks.1.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.1.gaussian.center]
max = 286
min = 284
vary = true
value = 285
[fitting.peaks.1.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.2.gaussian.amplitude]
expr = "gaussian_amplitude_1 / 3"
[fitting.peaks.2.gaussian.center]
expr = "gaussian_center_1 + 0.8"
[fitting.peaks.2.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.3.gaussian.amplitude]
max = 4
min = 0.1
vary = true
value = 0.5
[fitting.peaks.3.gaussian.center]
expr = "gaussian_center_1 + 1.7"
[fitting.peaks.3.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.4.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.4.gaussian.center]
expr = "gaussian_center_1 + 1.9"
[fitting.peaks.4.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.5.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.5.gaussian.center]
max = 288.6
min = 292.2
vary = true
value = 289
[fitting.peaks.5.gaussian.fwhmg]
max = 2.0
min = 1.8
vary = true
value = 1.9
Example 2: three peaks
[fitting.peaks.1.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.1.gaussian.center]
max = 286
min = 285
vary = true
value = 285.39
[fitting.peaks.1.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.2.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.2.gaussian.center]
max = 287
min = 286.4
vary = true
value = 286.6
[fitting.peaks.2.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
[fitting.peaks.3.gaussian.amplitude]
max = 4
min = 0
vary = true
value = 0.5
[fitting.peaks.3.gaussian.center]
max = 288
min = 290
vary = true
value = 288.4
[fitting.peaks.3.gaussian.fwhmg]
max = 1.9
min = 1.7
vary = true
value = 1.8
Fitting Results
parameter | Example 1 | Example 2 |
---|---|---|
\(\chi^2\) | 0.15365078580492708 | 0.17338375921457175 |
\(\textbf{red}\chi^2\) | 0.005487528064461682 | 0.005779458640485725 |
\(\textnormal{AIC}\) | -193.9287463695311 | -193.21657151272893 |
\(\textnormal{BIC}\) | -175.62956826210498 | -178.24451669756212 |
Linear Correlation¶
The linear correlation between the the fitting results is calculated using the Pearson's correlation coefficient. The Pearson's correlation coefficient is a measure of the linear dependence between two variables.
Correlation Results
Example 1:
Peak 1 | Peak 2 | Peak 3 | Peak 4 | Peak 5 | |
---|---|---|---|---|---|
Peak 1 | 1.0 | 0.6284 | 0.000 | -0.091 | -0.376 |
Peak 2 | 0.628 | 1.0 | 0.620 | 0.477 | -0.371 |
Peak 3 | 0.000 | 0.620 | 1.0 | 0.978 | -0.274 |
Peak 4 | -0.091 | 0.477 | 0.978 | 1.0, | -0.225 |
Peak 5 | -0.376 | -0.371 | -0.274 | -0.225 | 1.0 |
Example 2:
Peak 1 | Peak 2 | Peak 3 | |
---|---|---|---|
Peak 1 | 1.0 | -0.021 | -0.364 |
Peak 2 | -0.021 | 1.0 | -0.241 |
Peak 3 | -0.364 | -0.241 | 1.0 |
Five peak fit of the
Reference Data
above.
Three peak fit of the
Reference Data
above.
Regression Metrics¶
For the regression metrics, the fitting results are compared to the reference data. The metrics are calculated for each fit individually and based on the sklearn.metrics
module. Currently, the following metrics are implemented:
-
Explained Variance Score
-
R²
-
Max Error
-
Mean Absolute Error
-
Mean Squared Error
-
Mean Squared Log Error
-
Median Absolute Error
-
Mean Absolute Percentage Error
The metrics should provide a better overview of the individual fitting results and the used model then just using the Goodness of Fit
metrics. However, the following metrics are not implemented yet:
-
Mean Percentage Error
1 -
Mean Gamma Deviance
1 -
Mean Tweedie Deviance
1 -
Mean Pinball Loss
1 -
D² tweedie score
2 -
D² pinball score
2 -
D² absolute error score
2
Regression Metrics For Example 6
Metric | Spectra 1 | Spectra 2 | Spectra 3 |
---|---|---|---|
explained variance score | 0.97 | 0.88 | 0.98 |
r2 score | 0.97 | 0.86 | 0.98 |
max error | 0.05 | 0.14 | 0.05 |
mean absolute error | 0.02 | 0.04 | 0.02 |
mean squared error | 0.00 | 0.00 | 0.00 |
mean squared log error | 0.00 | 0.00 | 0.00 |
median absolute error | 0.02 | 0.04 | 0.02 |
mean absolute percentage error | 441.19 | 0.25 | 0.62 |
mean poisson deviance | 0.02 | 0.02 | 0.01 |
Metric Plots¶
In the case of using the Jupyter Notebook interface of the SpectraFit
package, the metrics will be plotted automatically. It uses the
to plot the results for each run and can be also exported as .csv
. The idea is to use multiple runs and metrics together to get a better overview of the fitting performance.
Example 3 and Example 4 show the fit abd metric plots for the fitting results of four runs of the
Reference Data
above.
The user can select any of the metrics 1 and 2 by using the keywords bar_criteria
and line_criteria
.