PharmPK Discussion - Estimating std deviation for skewed data, small sample size

PharmPK Discussion List Archive Index page

On 31 Jul 2003 at 01:52:30, "Sarah Anne Marston" (smarston.aaa.pharmastatsci.com) sent the message


Dear Group,

I have antibody titer data, about 7 to 12 subjects per group. We are
reporting the geometric mean, and we would like to estimate the std dev
.. I
tried taking the anti-log of the 95% CI of the log-transformed data, but
that doesn't work - apparently the data are too skewed and the sample
sizes
are small. Any suggestions for estimating a std dev under these
circumstances?

Thanks.

Sincerely,

Sarah

Back to the Top

On 31 Jul 2003 at 10:39:42, "Gobburu, Jogarao V" (GOBBURUJ.-a-.cder.fda.gov) sent the message


If all you want to convey in your report is a feel for the spread of
the data, is it notreasonable to simply report the rangebecause you
think the data are too skewed and the sample size is small.

Joga Gobburu
Pharmacometrics
CDER/FDA

Back to the Top

On 31 Jul 2003 at 16:37:20, "Kayode Ogungbenro" (mbpssko3.aaa.man.ac.uk) sent the message


The following message was posted to: PharmPK

Hi,
  The following references will help you to understand better the
relationship between the mean, standard deviation and confidence
interval of skewed and normal data

1. Zhou et al (1997) Statistics in Medicine 16, 783-790
2.Taylor et al (2002) Statistics in Medicine 21, 1443-1459.

Kayode Ogungbenro
Ph.D Student

Back to the Top

On 31 Jul 2003 at 11:37:09, (Scott.D.Patterson.aaa.gsk.com) sent the message


Interquartile range - reflecting spread in the middle half of the data
- is a good measure of spread for highly skewed dist'ns (cf. Schulmann,
Statistics in Plain English, VNR, 1992, p. 29-30).

-Scott

Scott Patterson
GlaxoSmithKline Pharmaceuticals
2301 Renaissance Blvd.
King of Prussia, PA 19406-2772
Email: scott.d.patterson.-at-.gsk.com
Phone: 610-787-3865

Back to the Top

On 31 Jul 2003 at 10:30:10, Paul Johnson (p.johnson.aaa.prodigy.net) sent the message


The following message was posted to: PharmPK

Dear Sarah,
             Here is a reference that may be of
interest and use:

Shumway, R.H., Azari, A.S. and Johnson, P. (1989),
“Estimating Mean Concentrations Under Transformation
for Environmental Data with detection Limits,”
Technometrics, 31(3): 347-356.

I have a set of variability factor programs for
estimating the mean, standard deviation and other
quantitities (e.g., 99th percentile) for the lognormal
with and without non-detects [single and/or multiple
detection limits].

Let me know if they would be of use. I can e-mail them
to you.

Paul Johnson
http://www.biostatsoftware.com

Back to the Top

On 1 Aug 2003 at 07:56:52, Nick Holford (n.holford.at.auckland.ac.nz) sent the message


The following message was posted to: PharmPK

Hi,

The responses to this question remind me of the parable of the blind
men and the elephant.
http://www.wordfocus.com/word-act-blindmen.html

Various parameters of this small sample have been mentioned: skewness,
geometric mean, standard deviation, range (plus some siblings from
Brian Smith). What is the purpose of estimating and reporting these
parameters?

Is it to simply describe the observations? In which case why attempt to
summarize them with statistics? You could display all of them
graphically or list them in a table. With a larger sample size
histograms of the frequency distrbution would be nice.

Is it to test some null hypothesis? The uncertainty about the assumed
distribution for the test statistic may make it more sensible to
consider a randomization test comparing a statistic of interest across
the groups.
http://wfn.sourceforge.net/rtmethod.htm

I suggest to Sarah that she considers whether she want to show people
the elephants she had discovered or jsut tell them about bits (weight?
colour?) of the beast and/or how the beasts differ from each other.

Nick

Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New
Zealand
email:n.holford.-a-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/

Back to the Top

On 31 Jul 2003 at 21:40:34, "Sarah Anne Marston" (smarston.-a-.pharmastatsci.com) sent the message


Dear Group,

Many thanks to all who responded to my question, your replies are much
appreciated. Please find below a helpful reply from Brian Smith, a
Statistician at Lilly, sent to the Clin Phar Stat list. Because of the
interest on the PharmPK list, I thought I would forward it.

BTW, if you are interested in the Clin Phar Stat list, you can register
at
the topica.com website. This list includes a mix of Statisticians and
Scientists and everything in between.

Sincerely,

Sarah

-----Original Message-----
From: Brian Smith [mailto:smith_brian_p.-a-.lilly.com]
Sent: Thursday, July 31, 2003 11:01 AM
To: cps.-a-.topica.com
Subject: Re: CLIN PHAR STAT: estimating std deviation for skewed data,
small
sample size

Sarah,

Let xbar and s2 be the sample mean and standard deviation from the
log-transformed data.  Assuming that the data has a log-normal
distribution.
Assume that m and t2 are the true mean and standard deviation of the
log-transformed data that you are estimating with xbar and s2.  The
following are facts about the log-normal distribution.

Geometric mean = exp(m)
The expected value (or mean) = exp(m + t2/2)
The variance = (exp(t2) - 1)*exp(2*m + t2)
The standard deviation = sqrt (variance) = exp(m + t2/2)*sqrt(exp(t2)-1)
The coefficient of variation = (standard deviation)/(the expected
value) =
sqrt(exp(t2)-1)

Thus, maximum likelihood estimates for these values are obtained by
substituting xbar for m and s2 for t2.

A maximum likelihood estimate of the standard deviation is given by

exp(xbar + s2/2)*sqrt(exp(s2)-1)

Note: One is often told that s2 = (sum of squares)/(n-1) is an unbiased
estimate of the variance.  It is, but it is not the maximum likelihood
estimate of the variance.  s2'=(sum of squares)/n is the maximum
likelihood
estimate of the variance.  Thus, you probably want the following to
estimate
the standard deviation
exp(xbar + s2'/2)*sqrt(exp(s2')-1).

With that all said, a log-normal distribution can be completely
described
with the geometric mean for an estimate of central tendency and the
coefficient of variation for an estimate of variability.  It is my
opinion,
that these are the two things that you should report.  A maximum
likelihood
estimate of the coefficient of variation is given by sqrt(exp(s2')-1).
This
is what I think the preferred estimate of this quantity is.  Another
way to
estimate CV is to find the arithmetic mean of the values (not log
transformed) and divide this by the sample standard deviation of the
values
(not log transformed).  You can check to see that these estimates will
be
similar, if you like.  But, again, I recommend the maximum likelihood
estimate.

Sincerely,

Brian Smith

Back to the Top

On 4 Aug 2003 at 11:44:54, SMITH_BRIAN_P.-a-.Lilly.com sent the message


I never try and disagree with Nick if I can help it. There are always
pearls of wisdom in what he says even when he is trying to be
controversial.

One thing I would like to add, however, with pharmacokinetic data there
is often useful information that is imparted with a good measure of
center and a good measure of variability. First, however, an under
appreciated fact: if your data comes from a log-normal distribution,
then the geometric mean is the maximum likelihood estimate of the
median. With that said, I have heard pharmacokineticists describe, for
instance, how widely the drug circulates based on the central tendency
estimate of the volume of distribution. Additionally, use of
coefficient of variation becomes a way that the variability in one drug
can be compared to other drugs. I see these measures as useful in our
ability to describe the drug. Displaying them in a table does not
immediately allow for understanding of variability. I would imagine
that by the time you get to say a dozen observations, looking at the
values becomes hard to decipher. At least it is for me.

In theoretical statistics a lot of time is spent with the notion of
sufficient statistics. Given a distribution a set of sufficient
statistics allows one to estimate that distribution. The raw
observations are indeed sufficient regardless of the distribution.
Now, if you are willing to assume, for instance, that a set of AUC's
has a log-normal distribution (in my experience this is usually a
reasonable assumption), then the geometric mean and coefficient of
variation are sufficient. That is I can completely estimate the
distribution based on these two estimates. For, a log-normal
distribution we would say that these estimates are minimally
sufficient, since you cannot completely describe a log-normal
distribution with one statistic.

Practically speaking it all comes down to usefulness. Tables of raw
values, histograms, and estimates of parameters, all have a useful
place in helping people understand the data. In the case of the
elephant and the blind men we see that the set of sufficient statistics
may very well be (a wall, a spear, a snake, a tree, a fan, and a rope).
We need all of these statistics (and probably more) to understand the
distribution of an elephant. On the other hand, if the data is
log-normal, the geometric mean and coefficient of variation will
suffice.

Sincerely,

Brian Smith

Back to the Top

On 5 Aug 2003 at 07:36:20, Nick Holford (n.holford.at.auckland.ac.nz) sent the message


The following message was posted to: PharmPK

Brian,

Your point about sufficient statistics is well made. However, in the
specific case that Sarah asked about I think it was clear to her that a
log-normal distribution was not a good description of her data. In the
absence of an adequate description for the distribution there are no
minimally sufficient statistics. That is why I suggested an empirical
approach (show the data itself) for a more honest description.

Sarah did not say what her main goal was for this data. IMHO the
arithmetic mean and standard deviation are as good as anything for
descriptive statistics if there is no clear goal.

Nick

Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New
Zealand
email:n.holford.-at-.auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/

Back to the Top

Want to post a follow-up message on this topic? If this link does not work with your browser send a follow-up message to PharmPK@boomer.org with "Estimating std deviation for skewed data, small sample size" as the subject

PharmPK Discussion List Archive Index page