Back to the Top
In the BE studies it is customary to calculate within subject
variability for the purpose of power calculations. We routinely log
transform Cmax and AUCs for calculating 90% CI. My question is do we
get different values for within subject variability (%CV) if we use
log transformed Cmax AUC values as opposed to using non transformed
values.
Prasad
Prasad NV Tata, Ph.D., FCP
Manager-Pharmacokinetics
Mallinckrodt, Inc.
675 McDonnell Blvd.
Saint Louis, MO 63134
Tel: (314) 654-5325
Fax: (314) 654-9325
e-mail: prasad.tata.at.tycohealthcare.com
Back to the Top
The following message was posted to: PharmPK
Hello,
You certainly get different values, because you use different formulae.
Since we standardly use log transformations in BE studies (because PK
parameters tend to be more lognormally distributed), the within-subject
CV estimate based on log transformed data [calculated as
CV=100*sqrt(exp(MSE)-1))] is a better estimate of the real
within-subject variation than the one based on non-transformed data
[calculated as CV=100*SD/arithmeticmean].
Kind regards,
Michiel van den Heuvel
Organon
Back to the Top
The following message was posted to: PharmPK
Dear Michiel,
You wrote:
> Since we standardly use log transformations in BE studies (because PK
> parameters tend to be more lognormally distributed), the within-
subject
> CV estimate based on log transformed data [calculated as
> CV=100*sqrt(exp(MSE)-1))] is a better estimate of the real
> within-subject variation than the one based on non-transformed data
> [calculated as CV=100*SD/arithmeticmean].
I agree, but why do you not use simply
CV=100*sqrt(MSE)
instead of
CV=100*sqrt(exp(MSE)-1))
The latter equation converts the standard deviation of the log-normal
distribution to the standard deviation of the 'corresponding' normal
distribution. But I do not see any reason for doing so, since for the
statistical comparison of the geometric (logarithmic) means the standard
deviation of the log-normal distribution [ sqrt(MSE) ] is used, so again
there is no need for conversion. Please note that the latter equation
gives
higher values for CV than the first equation; at CV = 20% the
difference is
small (20.2% versus 20%), but the difference increases rapidly with CV.
Best regards,
Hans Proost
Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands
tel. 31-50 363 3292
fax 31-50 363 3247
Email: j.h.proost.-at-.rug.nl
Back to the Top
The following message was posted to: PharmPK
Dear Michiel,
You wrote:
> Since we standardly use log transformations in BE studies (because PK
> parameters tend to be more lognormally distributed), the within-
subject
> CV estimate based on log transformed data [calculated as
> CV=100*sqrt(exp(MSE)-1))] is a better estimate of the real
> within-subject variation than the one based on non-transformed data
> [calculated as CV=100*SD/arithmeticmean].
If a random variable (e.g. Cmax) follows a log-normal distribution,
those
two equations should give you the same answer (not exactly the same
numbers,
but both equations are correct). In BE study, CV=100*sqrt(exp
(MSE)-1))] is
used because other fixed effects, such as sequence, period and treatment
effects, need to be corrected first. In other words, those Cmax's
were not
just a random sample of a log-normal distribution. They are a mixture of
many log-normal distributions with different means. Therefore,
CV=100*SD/arithmeticmean] cannot be directly used to estimate the
true CV.
Dear Hans:
You wrote:
>I agree, but why do you not use simply
>CV=100*sqrt(MSE)
>instead of
>CV=100*sqrt(exp(MSE)-1))
CV=100*sqrt(MSE) is simply an approximation of CV=100*sqrt(exp
(MSE)-1)) (the
exact equation to calculate CV) for a log-normally distributed variable.
Reporting the EXACT CV cannot be worse than an approximation. As you
mentioned, when CV increases, the differece increases. Since the
approximation is lower than the exact CV. The exact CV should be
reported.
Yaning Wang, Ph.D.
Senior Pharmacometrician and Clinical Pharmacologist
Office of Clinical Pharmacology and Biopharmaceutics
Center of Drug Evaluation and Research
Food and Drug Administration
Office: 301-796-1624
The contents of this message are mine personally and do not necessarily
reflect any position of the Government or the Food and Drug
Administration.
Back to the Top
The following message was posted to: PharmPK
Dear Yaning,
Thank you for your comments. You wrote:
> CV=100*sqrt(MSE) is simply an approximation of CV=100*sqrt(exp
> (MSE)-1)) (the
> exact equation to calculate CV) for a log-normally distributed
variable.
I assume that we are talking about a different definition of CV. What
is a
CV of a log-normal distribution? We can approach this in two ways:
1) CV is defined as the CV of the data assuming that the distribution is
normal. Then it can be calculated:
a) in the same way as for a normal distribution, i.e. CV = 100 * SD /
arithmeticmean
b) from CV = 100 * sqrt(exp(MSE)-1)
These equations provide different numbers, but for large n both values
approach assymptotically (as can be concluded from the source code given
below). I agree that equation b) should be preferred.
The problem with this definition is that it calculates CV assuming
that the
distribution is normal, in spite of the assumption that the
distribution is
log-normal! This makes no sense to me.
2) CV is defined as 100 * SD of the log-normal distribution (only
valid if
logarithms with base e are used!). This implies that CV = 100 * sqrt
(MSE).
In this case the distribution is really treated as a log-normal
distribution.
To demonstrate the differences between the methods, I made a small
program
in Pascal. In total 1,000,000 numbers were randomly drawn from a log-
normal
distribution with mean = 100 and sigma = 0.5.
The arithmetic mean is 113.4 (!), the standard deviation is 60.4, and CV
according to 1a) is 53.3%.
The geometric mean is 100.0, the standard deviation is 0.500, and the CV
according to 2) is 50.0%.
The CV according to method 1b) is 53.3%.
Method 1a) and 1b) provide very similar results, and therefore it can be
concluded that method 1b) indeed gives the CV of the data assuming a
normal
distribution. Method 2) gives CV = 50% if sigma = 0.5. This sounds
good to
me.
--
PROGRAM TestDist;
FUNCTION NormRandom : DOUBLE;
{ Generates a random value (mean value 0 and standard deviation 1) :
N(0,1) }
BEGIN
NormRandom:=Cos(2*Pi*Random)*Sqrt(-2*Ln(Random))
END; {NormRandom}
CONST
N = 1000000;
Mean = 100;
Sigma = 0.5;
VAR
I : LONGINT;
X, Sx, Sxx, Slx, Slxx, M, S, S2 : DOUBLE;
BEGIN
Randomize;
Sx:=0; Sxx:=0; Slx:=0; Slxx:=0;
FOR I:=1 TO N DO
BEGIN
X:=Mean*Exp(Sigma*NormRandom);
Sx:=Sx+X;
Sxx:=Sxx+Sqr(X);
Slx:=Slx+Ln(X);
Slxx:=Slxx+Sqr(Ln(X))
END;
WriteLn;
M:=Sx/N;
S2:=(Sxx-Sqr(Sx)/N)/(N-1);
S:=Sqrt(S2);
WriteLn('normal',M:20:2,S:20:2,100*S/M:20:2);
M:=Exp(Slx/N);
S2:=(Slxx-Sqr(Slx)/N)/(N-1);
S:=Sqrt(S2);
WriteLn('log ',M:20:2,S*M:20:2,100*S:20:2);
S:=Sqrt(Exp(S2)-1);
WriteLn('log ',M:20:2,S*M:20:2,100*S:20:2);
ReadLn
END.
--
Best regards,
Hans Proost
Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands
tel. 31-50 363 3292
fax 31-50 363 3247
Email: j.h.proost.at.rug.nl
Back to the Top
Dear Hans,
You wrote:
"1) CV is defined as the CV of the data assuming that the
distribution is normal. Then it can be calculated:
b) from CV = 100 * sqrt(exp(MSE)-1))
I agree that equation b) should be preferred. The problem with this
definition is that it calculates CV assuming that the distribution is
normal, in spite of the assumption that the distribution is log-
normal! This makes no sense to me."
Let me explain where this formula comes from. This is just stats
theory, unfortunately I do not have a good reference by hand.
Let's assume X is log-normally distributed, then Y = ln(X) is
normally distributed; let's say with mean MU and variance S2 (ln is
natural logarithm).
Then it can be derived that the expectation of X equals E(X) = exp(MU
\0x01AE.5*S2), known as geometric mean
(the equation above was E(X) = exp(MUT.5*S2) where the T had a little
tail to the right - I'm not sure what will be sent in the plain text
conversion. Michiel, I hope the rest of your message got through OK -db)
And variance Var(X) = exp(2*MU+S2)*(exp(S2)-1)
So that the relative standard deviation, i.e. coefficient of
variation CV, equals
CV(X) = 100 * sqrt(Var(X)) / E(X) = 100 * sqrt(exp(S2)-1)
Where in ANOVA settings S2 is estimated by MSE from your ANOVA.
As noted in a previous mail, sqrt(exp(MSE)-1) can be approximated by
sqrt(MSE), which is simple mathematical Taylor approximation from exp
(MSE)-1 into MSE for MSE close to zero. This is the case which you
called incorrectly a log-normal distribution in your formula 2). The
only thing we assumed here is that the logvalues of the
concentrations are normally distributed.
Apart from this mathematics: the most obvious differences between
normal and lognormal distribution has to do with this CV. For a log-
normal distribution CV is independent from MU (mean), which means
that the RELATIVE standard deviation is the same for all levels and
thus standard deviation increases with levels, which we observe very
much in our PK data. For a normal distribution one assumes that
standard deviation is constant with mean, which means that CV
decreases with increasing mean.
With respect to the simulation you made in Pascal:
I think you should calculate X as Exp(Mean+Sigma*NormRandom). Then
you will find other results.
Furthermore you used a rather small SD of 0.5 compared to the mean of
100 and then the difference between a normal and a lognormal
distribution is not so big yet. Maybe try SD of 50 and mean of 100?
Kind regards,
Michiel van den Heuvel
Organon
Netherlands
Back to the Top
[This might be clearer, Michiel resent the message as plain text. My
attempt wasn't quite right - db]
Dear Hans,
You wrote:
"1) CV is defined as the CV of the data assuming that the distribution
is normal. Then it can be calculated:
b) from CV = 100 * sqrt(exp(MSE)-1))
I agree that equation b) should be preferred. The problem with this
definition is that it calculates CV assuming that the distribution is
normal, in spite of the assumption that the distribution is log-normal!
This makes no sense to me."
Let me explain where this formula comes from. This is just stats theory,
unfortunately I do not have a good reference by hand.
Let's assume X is log-normally distributed, then Y = ln(X) is normally
distributed; let's say with mean MU and variance S2 (ln is natural
logarithm).
Then it can be derived that the expectation of X equals E(X) exp(MU
+0.5*S2), known as geometric mean
And variance Var(X) = exp(2*MU+S2)*(exp(S2)-1)
So that the relative standard deviation, i.e. coefficient of variation
CV, equals
CV(X) = 100 * sqrt(Var(X)) / E(X) = 100 * sqrt(exp(S2)-1)
Where in ANOVA settings S2 is estimated by MSE from your ANOVA.
As noted in a previous mail, sqrt(exp(MSE)-1) can be approximated by
sqrt(MSE), which is simple mathematical Taylor approximation from
exp(MSE)-1 into MSE for MSE close to zero. This is the case which you
called incorrectly a log-normal distribution in your formula 2). The
only thing we assumed here is that the logvalues of the concentrations
are normally distributed.
Apart from this mathematics: the most obvious differences between normal
and lognormal distribution has to do with this CV. For a log-normal
distribution CV is independent from MU (mean), which means that the
RELATIVE standard deviation is the same for all levels and thus standard
deviation increases with levels, which we observe very much in our PK
data. For a normal distribution one assumes that standard deviation is
constant with mean, which means that CV decreases with increasing mean.
With respect to the simulation you made in Pascal:
I think you should calculate X as Exp(Mean+Sigma*NormRandom). Then you
will find other results.
Furthermore you used a rather small SD of 0.5 compared to the mean of
100 and then the difference between a normal and a lognormal
distribution is not so big yet. Maybe try SD of 50 and mean of 100?
Kind regards,
Michiel van den Heuvel
Organon
Netherlands
Back to the Top
The following message was posted to: PharmPK
Dear Hans:
You wrote:
>I assume that we are talking about a different definition of CV.
As far as I know, there is only one kind of definition for CV. That is
SD/Mean irrespetive of the distribution. If you want to convert it to a
percentage, then do 100*SD/Mean.
>What is a CV of a log-normal distribution?
See Michiel van den Heuvel's reply or read
http://en.wikipedia.org/wiki/Log-normal_distribution and apply
CV=SD/Mean=sqrt[Var(X)]/E(X). The only thing I want to add is that the
expectation of X, E(X)=exp(MU+0.5*S2), is the arithmetic mean, not the
geometric mean, of X. The geometric mean of X is equal to the median
of X.
(Note: X follows log-normal distribution, i.e. ln(X)~N(MU, S2))
Yaning Wang, Ph.D.
Senior Pharmacometrician and Clinical Pharmacologist
Office of Clinical Pharmacology and Biopharmaceutics
Center of Drug Evaluation and Research
Food and Drug Administration
Office: 301-796-1624
The contents of this message are mine personally and do not necessarily
reflect any position of the Government or the Food and Drug
Administration.
Back to the Top
The following message was posted to: PharmPK
Dear Michiel,
Thank you for your reply. Your theoretical explanation is
correct and beyond discussion. You wrote:
> As noted in a previous mail, sqrt(exp(MSE)-1) can be
> approximated by sqrt(MSE), which is simple mathematical
> Taylor approximation from exp(MSE)-1 into MSE for MSE
> close to zero.
Yes, stated this way, sqrt(MSE) is an approximation. But
this is not what I meant. Sqrt(MSE) is also exactly equal
to the standard deviation of the lognormal distribution.
> This is the case which you called incorrectly a
> log-normal distribution in your formula 2). The
> only thing we assumed here is that the logvalues of
> the concentrations are normally distributed.
According to my sources, and to your statement given
above, both expressions are equivalent. What did I say
incorrectly?
> Apart from this mathematics: the most obvious
> differences between normal and lognormal distribution
> has to do with this CV. For a log-normal
> distribution CV is independent from MU (mean), which
> means that the RELATIVE standard deviation is the
> same for all levels
Yes, we agree. But, a few line earlier you wrote:
> variance Var(X) = exp(2*MU+S2)*(exp(S2)-1)
This would imply that var(x) is dependent on MU (albeit
only to a limited extent).
> With respect to the simulation you made in Pascal:
> I think you should calculate X as
> Exp(Mean+Sigma*NormRandom).
> Then you will find other results.
No. In my simulations, Mean = 100, and refers to the
geometric mean of the assumed log-normal distribution.
This would give values around 2.688E+43, which is not a
usual range.
> Furthermore you used a rather small SD of 0.5
> compared to the mean of 100 and then the difference
> between a normal and a lognormal
> distribution is not so big yet. Maybe try SD of 50 and
> mean of 100?
I did, so I don't understand your question. I used the
term Sigma for the standard deviation of the log-normal
distribution.
In my earlier statement I wrote:
>> The problem with this definition is that it calculates
>> CV assuming that the distribution is normal, in spite
>> of the assumption that the distribution is log-normal!
>> This makes no sense to me.
I still do not have a reply to this comment. Perhaps I can
state it in a different way. Assume, we have a series of
data, and we have sufficient evidence that the data are
log-normally distributed, or some authority states that we
have to analyse the data assuming a log-normal
distribution.
Why would one calculate a CV?
a) To report a value. OK, any method is acceptable.
b) To get some idea of the degree of variability. OK, any
method is acceptable.
c) To apply in a statistical test.
Ad c): The only meaningful tests are either a
distribution-free test, or a test based on the assumption
of a log-normal distribution, which is equivalent to a
test based on the assumption of a normal distribution
applied to the logarithms of the values. In such a test
one should use the variance or sd. This is not
variance Var(X) = exp(2*MU+S2)*(exp(S2)-1)
Instead the variance of Y should be used, which is S2, and
is obtained e.g. from ANOVA Var(Y) = MSE = Sigma^2 in my
code example.
So we do not need to calculate CV according to 'your'
definition. We need MSE or Sigma as a measure of
variability. And I don't see why one should not interpret
a value Sigma = 0.5 as the 'log-normal equivalent' of a CV
= 50%. This is not an approximation, but a different view.
If a data are log-normally distributed, they should be
analysed and interpreted as their logarithms. For
practical purposes, mean(Y) is usually transformed back to
the original units, thus providing a geometric mean as the
measure of central tendency. For Var, SD and CV there is
no reason for back-transformation.
In the example of my previous message: In my view, the
rational value of 'mean' is 100, sigma = 0.5, and CV 50%.
Best regards,
Hans Proost
Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands
tel. 31-50 363 3292
fax 31-50 363 3247
Email: j.h.proost.aaa.rug.nl
Back to the Top
The following message was posted to: PharmPK
Dear Yaning,
Thank you for your reply. You wrote:
> As far as I know, there is only one kind of definition
> for CV. That is SD/Mean irrespetive of the distribution.
OK. Perhaps my statement was somewhat provocative, and not
fully to the point. Indeed, an (arithmic) mean and a
standard deviation can be calculated irrespective of the
distribution. But what I actually meant, is that mean and
sd cannot be interpreted without assuming a distribution.
Yes, they can be interpreted as a measure of central
tendency and a measure of variability, respectively. But
they cannot be used for a statistical test if the
distribution is unknown. And to me it makes no sense to
calculate a statistical parameter that cannot be
interpreted. But if a log-normal distribution is assumed,
we can calculated the geometric mean as a useful measure
of central tendency and the standard deviation of that
distribution as a useful measure of variability. This
standard deviation is 'sqrt(MSE)', in terms of the earlier
messages, and NOT 'sqrt(exp(MSE)-1)', as was demonstrated
in the example of my earlier message.
As explained more extensively in my message to Michiel, I
do not see any use in the equation 'sqrt(exp(MSE)-1)'.
Best regards,
Hans Proost
Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands
tel. 31-50 363 3292
fax 31-50 363 3247
Email: j.h.proost.aaa.rug.nl
Back to the Top
In BE study, what is the formula to calculate the intersubject CV?
Where can those components to calculate intersubject CV be found from
the output of PROC MIXED?
W
Back to the Top
The following message was posted to: PharmPK
Hi Weining!
You wrote:
>
>In BE study, what is the formula to calculate the intersubject CV?
>
If you have perfomed you analysis on ln-transformed data (which I hope):
CV = sqrt ( exp (MSE) -1)
where MSE = Sum of squared residuals / DF
DF = degrees of freedom = n1 + n2 -2
n1, n2 = number of subjects in sequences 1 and 2
for a standard 2x2 cross-over design
best regards,
Helmut
--
Helmut Schuetz
BEBAC
Consultancy Services for Bioequivalence and Bioavailability Studies
Neubaugasse 36/11
1070 Vienna/Austria
tel/fax +43 1 2311746
Web http://BEBAC.at
BE/BA Forum http://forum.bebac.at
Back to the Top
Hi,
I saw the discussion on this interesting topic. I have a few comments
to make:
1) Both STD and CV are statistics used to measure the variability of
the data. The question is "Which one is more appropriate to use?". If
it is appropriate to assume the data following the normal
distribution, then STD is independent of the mean and thus it is
enough to characterize the variability; If it is more appropriate to
assume the data following the log-normal distribution, then STD is
dependent of the mean and thus it alone is not appropriate to
characterize the variability; while CV is independent on the mean and
it is more appropriate to be used to measure the variability.
2) If the data follows log-normal distribution, CV=sqrt(exp(std^2-1))
from probability theory. In such case, the linear mixed model with
random subjects are usually fitted. To calculate the intra-subject
CV, the std^2 is estimated by the MSE; to calculate the inter-subject
CV, the std^2 is estimated by the variance estimate for the random
subject effect from the proc mixed procedure (proc glm can be used
too but with some more calculations).
Gerry Li
Senior Statistician, PhD
Collegeville, PA, 19426
Back to the Top
Hi,
I saw the discussion on this interesting topic. I have a few comments
to make:
1) Both STD and CV are statistics used to measure the variability of
the data. The question is "Which one is more appropriate to use?". If
it is appropriate to assume the data following the normal
distribution, then STD is independent of the mean and thus it is
enough to characterize the variability; If it is more appropriate to
assume the data following the log-normal distribution, then STD is
dependent of the mean and thus it alone is not appropriate to
characterize the variability; while CV is independent on the mean and
it is more appropriate to be used to measure the variability.
2) If the data follows log-normal distribution, CV=sqrt(exp(std^2-1))
from probability theory. In such case, the linear mixed model with
random subjects are usually fitted. To calculate the intra-subject
CV, the std^2 is estimated by the MSE; to calculate the inter-subject
CV, the std^2 is estimated by the variance estimate for the random
subject effect from the proc mixed procedure (proc glm can be used
too but with some more calculations).
Gerry Li, PhD
GlaxoSmithKline
Collegeville, PA, 19426
Back to the Top
Dear Gerry,
Yes, CV is independent on the mean and it is more appropriate to be
used to measure the variability. Also it is independent of unit.
About STD can We say directly , std is independent of mean.
std is sqrt of varience = 1/n * sum over i(i=1 to n) (Xi-Mean of X)
Yogesh
Back to the Top
Hi Yogesh,
If the measurements are independent and follow the same normal
distribution (this condition may be relaxed further), then the sample
STD by your formula (which is not unbiased in small sample, but
works fine in large sample) is independent of sample mean. Which one
(CV and STD) is more appropriate to describe the variablity really
depends on the data generating mechnism.
Gerry
PharmPK Discussion List Archive Index page
Copyright 1995-2010 David W. A. Bourne (david@boomer.org)