# benchmarks

Utilities for computing statistics on benchmark data.

Translated from https://github.com/jupyterlab/jupyterlab/blob/82df0b635dae2c1a70a7c41fe7ee7af1c1caefb2/galata/src/benchmarkReporter.ts#L150-L244 which was originally added in https://github.com/jupyterlab/benchmarks/blob/f55db969bf4d988f9d627ba187e28823a50153ba/src/compare.ts#L136-L213

## `Distribution` `dataclass` ¶

Statistical description of a distribution

Source code in `lineapy/utils/benchmarks.py`
 ```41 42 43 44 45 46 47 48 49 50 51 52``` ``````@dataclass class Distribution: """ Statistical description of a distribution """ mean: float variance: float @classmethod def from_data(cls, data: List[float]) -> Distribution: return cls(mean(data), variance(data)) ``````

## `DistributionChange` `dataclass` ¶

Change between two distributions

Source code in `lineapy/utils/benchmarks.py`
 ```17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38``` ``````@dataclass class DistributionChange: """ Change between two distributions """ # Mean value mean: float # Spread around the mean value confidence_interval: float # The confidence interval level, i.e. 0.95 for a 95% confidence interval confidence_interval_level: float def __str__(self): """ Format a performance changes like `between 20.1% slower and 30.3% faster (95% CI)`. """ return ( f"between {format_percent(self.mean + self.confidence_interval)} " f"and {format_percent(self.mean - self.confidence_interval)} " f"({self.confidence_interval_level * 100}% CI)" ) ``````

### `__str__()`¶

Format a performance changes like `between 20.1% slower and 30.3% faster (95% CI)`.

Source code in `lineapy/utils/benchmarks.py`
 ```30 31 32 33 34 35 36 37 38``` ``````def __str__(self): """ Format a performance changes like `between 20.1% slower and 30.3% faster (95% CI)`. """ return ( f"between {format_percent(self.mean + self.confidence_interval)} " f"and {format_percent(self.mean - self.confidence_interval)} " f"({self.confidence_interval_level * 100}% CI)" ) ``````

## `distribution_change(old_measures, new_measures, confidence_interval=0.95)`¶

Compute the performance change based on a number of old and new measurements.

Based on the work by Tomas Kalibera and Richard Jones. See their paper "Quantifying Performance Changes with Effect Size Confidence Intervals", section 6.2, formula "Quantifying Performance Change".

Note: The measurements must have the same length. As fallback, you could use the minimum size of the two measurement sets.

Parameters:

Name Type Description Default
`old_measures` `List[float]`

The list of timings from the old system

required
`new_measures` `List[float]`

The list of timings from the new system

required
`confidence_interval` `float`

The confidence interval for the results. The default is a 95% confidence interval (95% of the time the true mean will be between the resulting mean +- the resulting CI)

`0.95`

Test against the example in the paper, from Table V, on pages 18-19

``````res = distribution_change(
old_measures=[
round(mean([9, 11, 5, 6]), 1),
round(mean([16, 13, 12, 8]), 1),
round(mean([15, 7, 10, 14]), 1),
],
new_measures=[
round(mean([10, 12, 6, 7]), 1),
round(mean([9, 1, 11, 4]), 1),
round(mean([8, 5, 3, 2]), 1),
],
confidence_interval=0.95
)
from math import isclose
assert isclose(res.mean, 68.3 / 74.5, rel_tol=0.05)
assert isclose(res.confidence_interval, 60.2 / 74.5, rel_tol=0.05)
``````
Source code in `lineapy/utils/benchmarks.py`
 ```102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157``` ``````def distribution_change( old_measures: List[float], new_measures: List[float], confidence_interval: float = 0.95, ) -> DistributionChange: """ Compute the performance change based on a number of old and new measurements. Based on the work by Tomas Kalibera and Richard Jones. See their paper "Quantifying Performance Changes with Effect Size Confidence Intervals", section 6.2, formula "Quantifying Performance Change". Note: The measurements must have the same length. As fallback, you could use the minimum size of the two measurement sets. Parameters ---------- old_measures: List[float] The list of timings from the old system new_measures: List[float] The list of timings from the new system confidence_interval: float The confidence interval for the results. The default is a 95% confidence interval (95% of the time the true mean will be between the resulting mean +- the resulting CI) Test against the example in the paper, from Table V, on pages 18-19 ```python res = distribution_change( old_measures=[ round(mean([9, 11, 5, 6]), 1), round(mean([16, 13, 12, 8]), 1), round(mean([15, 7, 10, 14]), 1), ], new_measures=[ round(mean([10, 12, 6, 7]), 1), round(mean([9, 1, 11, 4]), 1), round(mean([8, 5, 3, 2]), 1), ], confidence_interval=0.95 ) from math import isclose assert isclose(res.mean, 68.3 / 74.5, rel_tol=0.05) assert isclose(res.confidence_interval, 60.2 / 74.5, rel_tol=0.05) ``` """ n = len(old_measures) if n != len(new_measures): raise ValueError("Data have different length") return performance_change( Distribution.from_data(old_measures), Distribution.from_data(new_measures), n, confidence_interval, ) ``````