An important feature of the umpem dimm SDK is the measurement of the power consumption of the upmem dimm at runtime. This is called "power-profiling". The "power-profiling" gives 3 different information about the power, the two extremums, min and max, and also the mean.
The implementation of those 3 estimators is done with limited ressources in practice, and this has a non negligeable impact on the quality of the estimation. min and max are quite straightforward in term of measurement, but the mean has a limited precision, depending of different parameters that the user needs to understand. The objective of this technical note is to formaly evaluates the quality of the "mean" estimation. We will show that the confidence and precision of the mean estimator mostly depends on a single critical parameter, called Ti_min, the shortest possible period of the ina (input current supply) signal, which is the image of p the instantaneous power supply.
The principal limitation of our implementation is the sampling frequency of the "power measurement subsystem", called Fech. Fech = 10KHz in the current implementation. We will see that this frequency is is quite slow for our requirements.
(10) power profiling subsystem
__________________________________________________________________________________
/ rank1|
_______/________________________________________________________________________ _____ |
| rank0| |uP15 | |
| _____ _____ _____ _____ _____ _____ __ __ _____ | |_____| |
| |uP0 | |uP1 | |uP2 | |uP3 | |uP4 | |uP5 | |uP6 | |uP7 | |_____| |
| |____ | |____ | |____ | |____ | |____ | |____ | |____ | |_____| | | |
| |_______|_________|_______|___________|_______|_________|_______| | | |
| _____________ | ________________|_____| |
| | MCU | | | | |
| | ______ | __|__|__ | |
| | | | | | | | |
| | | ADC |___|___ina___| DC/DC | | /
| | |_____| | |_______| |_________/
| |_____________| | |
| | |
|___________________________________________________________|___________________|
The measurement is done inside the "upmem dimm MCU" component, as it is the only part to have such hardware capabilities in the "upmem dimm". To measure the dpu rank power consumption, we actually measure the current supply only, and we takes the hypothesis that the input voltage is constant. Thus, this is the current supply ina that provides all the information about the rank power supply.
The instantaneous power signal is given by (0) below :
, with p in watt, ina in amp, vdd in volts.
The mean power consumption is given by (1) below :
, with n being the sample size (number of sample used by the estimator).
The problem here is that we must ensure that (1) gives good estimation of P, the ina mean.
As we take vdd as constant, the estimation of p mean is equivalent to the estimation of ina mean. ina signal may varry over time and we don't have any deterministic information about how it varry. Howerver, laboratory observations clearly shows that ina could be approched with this simple and generic stochastic model (2) below :
In (2), both Ti ans Si are random variables. We cannot make no assumptions on about Ti, but hopefully, our experimantations show that Si could be modelized as normal law, as it is the sum of a constant value plus a AWGN (additive white gaussian noise) signal : Note that AWGN is standard and is commonly used as noise signal model, in electrical engineering and telecommunication literature.
In the rest of the study, we will treat Si as random variable but we will not consider Ti as a random varivale that we want to modelize, but as a parameter we wants to minimize. More precisely, the objective of this technical note is to show that our estimator will be able to estimates the mean of the model (2) with a sufficient precision and confidence, with Ti parameter treated as a lower boundary condition.
The final parameter we actually wants to compute is N , the sample size. Ti, N and Fech parameter, are linked by (4) below :
Thus, Ti will be computed with (4), taking a fixed sampling frequency Fech, that is an uncompressible maximal value that depend on the power measurement subststem.
As explained before, the final parameter we want to compute is the minimal number of samples required to ensures that our measure have a correct precision, that is called N.
The exstimator (1) of signal (2) mean is actually the same than (5) below, which is the basic empirical mean formula.
For esperance estimation of the complete signal (2), we could formally show that considering the mean estimator of a sigle Si is equivalent. In other words, for the parameters we wants to compute, we only need to consider one single random variable Si. We will replace (1), the empirical mean of (2), by (4), the empirical mean of Xi, with Xi being Si in our case.
In (5), the critical parameter is N, the sample size. Recall that we have limited sampling and computation capabilities in the MCU, and so we need to verify that we reach certain objective, that is the estimation precision and confidence.
Formally, in any estimation problem, we could say that we want delta below as small as possible in the formula below.
To evaluates this, we use the single sided confidence interval theory, with Xi in (5), folowing normal law. With single sided confidence interval, we can computes the statistical intervals for which the esperance estimator result is the most likely to appear, given the parameter alpha, that is the probability that the estilator result doesn't fall out of this interval (formally, alpha is defined as the risk of rejecting a true hypothesis). In single sided confidence interval, we admit that sigma, the STD (standard deviation) of X is known. Thus, in practice, the choice of sigma is taken as the maximal measured ina STD value, with some majoration (the STD is measured with the 6sigma method in lab, on the real hw).
For a normal distribution, the confidence interval folows the formula below, with k defined as the k quantile of the normal law.
We could simplify (8), for one or more fixed well known k values, to :
, with lambda being the k quantile critical value of the normal law distribution function (that depends itself of alpha, the degree of confidence).
To computes the final sample size N, we will consider only two different values for alpha (the degree of conficence) :
alpha = 0.05 (95 % confidence) gives lamba = 1.96.
alpha = 0.01 (99 % conficence) gives lambda = 2.576.
We will also computes Ti_min, the critical ina profile period, for a fixed Fech of 10 KHz, that is the minimal period mode (see (1)), for which our estimation is correct it term of error and precision (see (8)).
confidence level 0.05 (9)
relarive error | N | Ti_min us, Fech=10 KHz |
---|---|---|
0.005 | 6521 | 652100 |
0.01 | 1631 | 163100 |
0.05 | 66 | 6600 |
0.1 | 17 | 1700 |
0.25 | 3 | 300 |
0.5 | 1 | 100 |
0.8 | 1 | 100 |
confidence level 0.01 (10)
relative error | N | Ti_min us, Fech=10 KHz |
---|---|---|
0.005 | 11264 | 1126400 |
0.01 | 2816 | 281600 |
0.05 | 113 | 11300 |
0.1 | 29 | 2900 |
0.25 | 5 | 500 |
0.5 | 2 | 200 |
0.8 | 1 | 100 |
As interpretation, we could look at the case where Ti_min = 6600 us, which is one of the most releavant comared to the of the order of magnitude that we observed in lab (see labs directory for ina signal captures). Thus, this results of 6600 us says that, for p mean estimation, our estimator has a 5 % error with 95 % of confidence, providing that the critical condition that the ina input signal has a Ti always less than 6600 us.
ina_sigma_amps (sigma of Si) parameter of model (2) could be measured with good precision with classical electronics measuring instruments like osciloscope, with the 6 sigma graphical method. That is sufficient for the expected precision. Here ina_mu_amps parameter is the maximal possible mean of Si in (2). In practice ina_mu_amps is chosen as the upmem PCB power supply component, maximal current.
ina_sigma_amps | ina_mu_amps | Ti us |
---|---|---|
4.12 | 20 | TBC |
from tabulate import tabulate
fechMaxHz = 10e3
nLawlambdaNormalized = {0.05: 1.96, 0.01: 2.576}
relativeMarginErrPrct = [0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 0.8]
inaStdAmps = 4.12
inaMuAmps = 20
for l in nLawlambdaNormalized:
res = []
for err in relativeMarginErrPrct:
nHat = ((nLawlambdaNormalized[l]**2 * inaStdAmps**2) /
(err**2 * inaMuAmps**2))
res.append([])
line = res[-1]
line.append(err)
line.append(int(1 + nHat))
line.append(int(1e6 * line[-1] / fechMaxHz))
print('confidence level', l)
print(tabulate(res))