Back to the Top
Dear Group,
I have antibody titer data, about 7 to 12 subjects per group. We are
reporting the geometric mean, and we would like to estimate the std dev
.. I
tried taking the anti-log of the 95% CI of the log-transformed data, but
that doesn't work - apparently the data are too skewed and the sample
sizes
are small. Any suggestions for estimating a std dev under these
circumstances?
Thanks.
Sincerely,
Sarah
Back to the Top
If all you want to convey in your report is a feel for the spread of
the data, is it notreasonable to simply report the rangebecause you
think the data are too skewed and the sample size is small.
Joga Gobburu
Pharmacometrics
CDER/FDA
Back to the Top
The following message was posted to: PharmPK
Hi,
The following references will help you to understand better the
relationship between the mean, standard deviation and confidence
interval of skewed and normal data
1. Zhou et al (1997) Statistics in Medicine 16, 783-790
2.Taylor et al (2002) Statistics in Medicine 21, 1443-1459.
Kayode Ogungbenro
Ph.D Student
Back to the Top
Interquartile range - reflecting spread in the middle half of the data
- is a good measure of spread for highly skewed dist'ns (cf. Schulmann,
Statistics in Plain English, VNR, 1992, p. 29-30).
-Scott
Scott Patterson
GlaxoSmithKline Pharmaceuticals
2301 Renaissance Blvd.
King of Prussia, PA 19406-2772
Email: scott.d.patterson.-at-.gsk.com
Phone: 610-787-3865
Back to the Top
The following message was posted to: PharmPK
Dear Sarah,
Here is a reference that may be of
interest and use:
Shumway, R.H., Azari, A.S. and Johnson, P. (1989),
“Estimating Mean Concentrations Under Transformation
for Environmental Data with detection Limits,”
Technometrics, 31(3): 347-356.
I have a set of variability factor programs for
estimating the mean, standard deviation and other
quantitities (e.g., 99th percentile) for the lognormal
with and without non-detects [single and/or multiple
detection limits].
Let me know if they would be of use. I can e-mail them
to you.
Paul Johnson
http://www.biostatsoftware.com
Back to the Top
The following message was posted to: PharmPK
Hi,
The responses to this question remind me of the parable of the blind
men and the elephant.
http://www.wordfocus.com/word-act-blindmen.html
Various parameters of this small sample have been mentioned: skewness,
geometric mean, standard deviation, range (plus some siblings from
Brian Smith). What is the purpose of estimating and reporting these
parameters?
Is it to simply describe the observations? In which case why attempt to
summarize them with statistics? You could display all of them
graphically or list them in a table. With a larger sample size
histograms of the frequency distrbution would be nice.
Is it to test some null hypothesis? The uncertainty about the assumed
distribution for the test statistic may make it more sensible to
consider a randomization test comparing a statistic of interest across
the groups.
http://wfn.sourceforge.net/rtmethod.htm
I suggest to Sarah that she considers whether she want to show people
the elephants she had discovered or jsut tell them about bits (weight?
colour?) of the beast and/or how the beasts differ from each other.
Nick
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New
Zealand
email:n.holford.-a-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
Back to the Top
Dear Group,
Many thanks to all who responded to my question, your replies are much
appreciated. Please find below a helpful reply from Brian Smith, a
Statistician at Lilly, sent to the Clin Phar Stat list. Because of the
interest on the PharmPK list, I thought I would forward it.
BTW, if you are interested in the Clin Phar Stat list, you can register
at
the topica.com website. This list includes a mix of Statisticians and
Scientists and everything in between.
Sincerely,
Sarah
-----Original Message-----
From: Brian Smith [mailto:smith_brian_p.-a-.lilly.com]
Sent: Thursday, July 31, 2003 11:01 AM
To: cps.-a-.topica.com
Subject: Re: CLIN PHAR STAT: estimating std deviation for skewed data,
small
sample size
Sarah,
Let xbar and s2 be the sample mean and standard deviation from the
log-transformed data. Assuming that the data has a log-normal
distribution.
Assume that m and t2 are the true mean and standard deviation of the
log-transformed data that you are estimating with xbar and s2. The
following are facts about the log-normal distribution.
Geometric mean = exp(m)
The expected value (or mean) = exp(m + t2/2)
The variance = (exp(t2) - 1)*exp(2*m + t2)
The standard deviation = sqrt (variance) = exp(m + t2/2)*sqrt(exp(t2)-1)
The coefficient of variation = (standard deviation)/(the expected
value) =
sqrt(exp(t2)-1)
Thus, maximum likelihood estimates for these values are obtained by
substituting xbar for m and s2 for t2.
A maximum likelihood estimate of the standard deviation is given by
exp(xbar + s2/2)*sqrt(exp(s2)-1)
Note: One is often told that s2 = (sum of squares)/(n-1) is an unbiased
estimate of the variance. It is, but it is not the maximum likelihood
estimate of the variance. s2'=(sum of squares)/n is the maximum
likelihood
estimate of the variance. Thus, you probably want the following to
estimate
the standard deviation
exp(xbar + s2'/2)*sqrt(exp(s2')-1).
With that all said, a log-normal distribution can be completely
described
with the geometric mean for an estimate of central tendency and the
coefficient of variation for an estimate of variability. It is my
opinion,
that these are the two things that you should report. A maximum
likelihood
estimate of the coefficient of variation is given by sqrt(exp(s2')-1).
This
is what I think the preferred estimate of this quantity is. Another
way to
estimate CV is to find the arithmetic mean of the values (not log
transformed) and divide this by the sample standard deviation of the
values
(not log transformed). You can check to see that these estimates will
be
similar, if you like. But, again, I recommend the maximum likelihood
estimate.
Sincerely,
Brian Smith
Back to the Top
I never try and disagree with Nick if I can help it. There are always
pearls of wisdom in what he says even when he is trying to be
controversial.
One thing I would like to add, however, with pharmacokinetic data there
is often useful information that is imparted with a good measure of
center and a good measure of variability. First, however, an under
appreciated fact: if your data comes from a log-normal distribution,
then the geometric mean is the maximum likelihood estimate of the
median. With that said, I have heard pharmacokineticists describe, for
instance, how widely the drug circulates based on the central tendency
estimate of the volume of distribution. Additionally, use of
coefficient of variation becomes a way that the variability in one drug
can be compared to other drugs. I see these measures as useful in our
ability to describe the drug. Displaying them in a table does not
immediately allow for understanding of variability. I would imagine
that by the time you get to say a dozen observations, looking at the
values becomes hard to decipher. At least it is for me.
In theoretical statistics a lot of time is spent with the notion of
sufficient statistics. Given a distribution a set of sufficient
statistics allows one to estimate that distribution. The raw
observations are indeed sufficient regardless of the distribution.
Now, if you are willing to assume, for instance, that a set of AUC's
has a log-normal distribution (in my experience this is usually a
reasonable assumption), then the geometric mean and coefficient of
variation are sufficient. That is I can completely estimate the
distribution based on these two estimates. For, a log-normal
distribution we would say that these estimates are minimally
sufficient, since you cannot completely describe a log-normal
distribution with one statistic.
Practically speaking it all comes down to usefulness. Tables of raw
values, histograms, and estimates of parameters, all have a useful
place in helping people understand the data. In the case of the
elephant and the blind men we see that the set of sufficient statistics
may very well be (a wall, a spear, a snake, a tree, a fan, and a rope).
We need all of these statistics (and probably more) to understand the
distribution of an elephant. On the other hand, if the data is
log-normal, the geometric mean and coefficient of variation will
suffice.
Sincerely,
Brian Smith
Back to the Top
The following message was posted to: PharmPK
Brian,
Your point about sufficient statistics is well made. However, in the
specific case that Sarah asked about I think it was clear to her that a
log-normal distribution was not a good description of her data. In the
absence of an adequate description for the distribution there are no
minimally sufficient statistics. That is why I suggested an empirical
approach (show the data itself) for a more honest description.
Sarah did not say what her main goal was for this data. IMHO the
arithmetic mean and standard deviation are as good as anything for
descriptive statistics if there is no clear goal.
Nick
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New
Zealand
email:n.holford.-at-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/
PharmPK Discussion List Archive Index page
Copyright 1995-2010 David W. A. Bourne (david@boomer.org)