PharmPK Discussion - Standard errors of Bayesian parameter estimates

PharmPK Discussion List Archive Index page

On 10 May 2009 at 19:06:40, "andreas lindauer" (lindauer.-a-.uni-bonn.de) sent the message


Dear All,

I have some questions regarding the calculation of standard errors (or
confidence intervals) of Bayesian (PK-)parameter estimates.
Suppose the following expression is to be minimized:

SSE=sum((observations-predictions)^2/sigma^2))
+sum((individual_parameter-population_parameter)^2/omega^2))

Where observations are from a single patient only and predictions are
obtained from some nonlinear equation (e.g. a PK-model).

1) Is it (mathematically) correct to calculate SEs for the parameter
estimates like this:

     M=number of parameters
     n=number of observations
     MSE=SSE/(n+M-M)  # Mean squared error

     SE = sqrt(MSE.*diagonal(inverse(hessian)))

2) However, 1) gives only asymptotic SEs which are known to have some
disadvantages. Therefore in population modeling non-parametric
resampling techniques (bootstrap, jack-knife) are increasingly
applied. Such methods, however, wouldn't make much sense for single-
subject data with only few observations.

Are there other methods that could be used for such situations in
order to calculate confidence intervals? Maybe log-likelihood-profiling?
Or, can inference statistics be applied on the posterior distribution
of a parameter?

3) How can a confidence band around the predicted curve be constructed?

Thanks and regards, Andreas.

--

Andreas Lindauer

Department of Clinical Pharmacy
Institute of Pharmacy
University of Bonn
An der Immenburg 4
D-53121 Bonn

Back to the Top

On 11 May 2009 at 06:43:20, "Ed O'Connor" (Bioconsul09.-a-.cox.net) sent the message


The following message was posted to: PharmPK

Andreas:  I cannot see how you would calculate a SSE for an individual
patient when the data collected are released as a single value for each time
point, which is the norm.   Please explain.

Back to the Top

On 11 May 2009 at 17:51:23, "andreas lindauer" (lindauer.aaa.uni-bonn.de) sent the message


The following message was posted to: PharmPK

Ed,

Maybe my nomenclature was confusing. With SSE I essentially meant a
Bayesian
least squares criterion i.e. an Objective function value that is
minimized
by some nonlinear regression algorithm (e.g. Levenberg-Marquardt) which
provides a Hessian matrix.

There should be no problem with data obtained at different time points.

Sorry for the confusion, Andreas.

Back to the Top

On 11 May 2009 at 17:43:37, "Roger Jelliffe" (jelliffe.aaa.usc.edu) sent the message


Dear Ed and Andreas:

           You are asking a very good question. It seems to me that
you wish to have some idea of the uncertainties present in an
individual patient's  Bayesian posterior parameter distribution. I
don't quite understand why you wish to do this for a parameter
distribution which is assumed to have some particular shape. It does
not get you the model likely posterior distribution. It just gets you
the estimated means and covariances. Also why do you wish to do this
at all? Just to know, or perhaps to use this information somehow in
calculating the uncertainties which will result from a dosage regimen
based on those posterior parameter estimates and those variances. It
seems to me that you can do better, and can also see the results
graphically, by using nonparametric (NP) approaches.

           First, use an NP approach to make a population PK/PD model.
There are theorems by Mallet, Lindsay, and Carathodory that prove that
you do not have to examine a continuous distribution over all its
possible parameter spaces, but that the maximum likelihood
distribution "can be found" in a discrete set of points that
constitute the most likely joint parameter distribution. In this way,
no assumptions need to be made about the shape of any continuous
function such as normal, lognormal, etc. The NP approach finds the
most likely set of support points given the raw data and the error
model used. You wind up with up to 1 support point per subject studied
in the population. You get the entire discrete distribution. You will
see unsuspected subpopulations such as faster and slower metabolizers,
for example, at that early stage of analysis.

           What does this do for you? You don't need to assume any
distribution. You don't need to obtain the "estimators" (means and
covariances) of a distribution. They will come anyway. What you get is
the entire distribution by estimating its NP support points. Each
support points consists of a point estimate for each model parameter
(vol, clearance, rate constants) and an estimate of the probability
associated with each set of estimates. So now you have many models
(support points) rather than just one, as with parametric approaches.
You also now have a tool with which to estimate the errors associated
with future predictions of serum concentrations, for example, that
result from a dosage regimen.

           How are you going to develop an initial dosage regimen when
you can see that some patients are fast metabolizers and others are
slow, and others you don't even know about yet? How do you control
MOST PRECISELY a patient when all you now about him/her is that s/he
is a member of such a population? How do you CONTROL a patient most
precisely who at this stage is represented only by a vaguely known
"kinda - sorta" multimodal parameter distribution?

           Using the NP joint parameter distribution, it is easy to
get multiple predictions of future serum concentrations and other
responses, one from each model support point, weighted by the
probability of that support point. These weighted predictions can be
compared with the desired target goal to be achieved at the desired
time. You can compute the weighted squared error of the failure of
that regimen to hit the target. You can then find the regimen which
specifically minimizes that error. This is the most precise dosage
regimen you can develop to hit any target goal. It is the optimal way
to use all the information based on the data you have obtained up to
now. This called the Multiple Model (MM) approach. This approach has
been and is widely used in the aerospace community for flight control
and spacecraft guidance systems, and for tracking (and hitting)
hostile targets taking evasive action. Our lab uses it for developing
drug dosage regimens. See www.lapk.org, and click on new developments,
teaching topics, and software.

           Now we are getting there. We have a joint parameter
distribution in the form of these NP distributions. We give the
patient a dosage regimen and do TDM, monitoring the serum
concentrations at various times, to obtain a Bayesian posterior
parameter distribution for that individual patient. We are now getting
at the key point of the discussion. Now, instead of using the maximum
a posteriori probability (MAP) Bayesian approach, we simply start with
the prior probability of each support in the population model, examine
the data, and compute the Bayesian posterior probability of each
support point given the data. Most support points do not fit the data
well at all. Their posterior probability becomes very small or
negligible. The few that do predict the patient's data well become
much more likely. You now have a few, (or maybe only one), posterior
support points. These points constitute the Bayesian posterior joint
NP parameter distribution. You can see graphically the much narrower,
more precise, bands of prediction of the patient's data. You can see
graphically what you have learned about the patient by the TDM you
have done. You will also get the patient's Bayesian posterior
parameter means, medians, modes, and covariances as well.

           You can now use this patient's entire Bayesian posterior
parameter distribution to develop the same kind of MM dosage regimen
we described before,  but now using this much more precise posterior
model to get the most precise hit on the desired target goal.

           The NP approaches do not yet give you rigorous confidence
bands, but they do give you easily seen 95 percentile distances
concerning the estimates of the past and also of the expected future
concentrations. There is no need to use parametric approaches when you
don't even know what the shape of the parameter distribution is. And,
as you say, the parametric approached, give only asymptotic confidence
estimates.

           Our clinical software does the job I think you are
interested in. Go to www.lapk.org. Click on teaching topics, on new
developments, and on software. You can download demo versions of this
MM-USCPACK software for population modeling and for clinically
practical approached to dosage optimizations for patients. I hope you
like it.

Very best regards,

Roger Jelliffe

Back to the Top

On 12 May 2009 at 09:03:03, "andreas lindauer" (lindauer.-at-.uni-bonn.de) sent the message


The following message was posted to: PharmPK

Roger,

Thank you very much for your extensive comment on this. You are right,
some
idea of parameter uncertainty is what I would like to have. I think in a
parametric Bayesian estimation scenario one would like to have not only
point estimates of the most probable PK-parameters, but also some idea
if
the estimate is precise enough to give an adequate dose recommendation.
The NP approach really has some advantages over parametric methods. You
explained them very convincingly in your last mail. Generally, I am
much in
favour of the NP approach for therapeutic drug monitoring, however,
there is
one major drawback. You mentioned it in your second paragraph:
   "First, use an NP approach to make a population PK/PD model."
Suppose the pharmacy department of a hospital wants to implement TDM
for a
new drug (not one already included in the USC*PACK ;-) ). In order to
use
the NP method for TDM they first had to perform a Population PK-study
to get
a sufficient number of support points. For the parametric approach,
however,
the necessary information may be already published in a PopPK-Analysis
of
this drug using e.g. NONMEM.
I think as long as 99% of PopPK-Studies are done with parametric
methods the
use of the NP method for TDM will be limited, despite its obvious
advantages.

Very best regards, Andreas.

Back to the Top

On 12 May 2009 at 16:11:54, "J.H.Proost" (J.H.Proost.-at-.rug.nl) sent the message


The following message was posted to: PharmPK

Dear Andreas,

You asked:

 > 1) Is it (mathematically) correct to calculate SEs for the parameter
 > estimates like this:
 >
 >     M=number of parameters
 >     n=number of observations
 >     MSE=SSE/(n+M-M)  # Mean squared error
 >
 >     SE = sqrt(MSE.*diagonal(inverse(hessian)))

  From numerous Monte Carlo simulations I concluded that SEs can be
calculated accurately from the latter equation, if MSE is calculated
from:

MSE = SSE/(n+M)

 > 2) However, 1) gives only asymptotic SEs which are known to have
 >some  disadvantages. Therefore in population modeling non-parametric
 > resampling techniques (bootstrap, jack-knife) are increasingly
 > applied. Such methods, however, wouldn't make much sense for single-
 >subject data with only few observations.

Asymptotic SEs have indeed some disadvantages, but they still may be
useful and sufficiently precise to estimate confidence intervals.
You may find some useful information in my paper:

Proost JH, Eleveld DJ. Performance of an Iterative Two-Stage Bayesian
technique for population pharmacokinetic analysis of rich data sets.
Pharm Res 2006; 23: 2748-2759 (Erratum in Pharm Res 2007; 24: 1599).

Please see the Erratum for the correct version of Eq. 6. I can send
you the pdfs.

 > Are there other methods that could be used for such situations in
 > order to calculate confidence intervals? Maybe
 >log-likelihood-profiling?

This is indeed a good alternative approach. Fix all parameters except
for one, and find the parameter values where the objective function
-2*LogLikelihood is increased by the critical value of the F
distribution with df1 = 1 and df2 = infinity (or better: the df
associated with the prior distribution). For a 95% confidence
interval, use alpha = 0.025 (critical value 3.84 (equal to 1.96^2)).

 > Or, can inference statistics be applied on the posterior
 >distribution  of a parameter?

Yes, but this requires that the posterior distribution is known, and
that was the problem.

 > 3) How can a confidence band around the predicted curve be
 >constructed?

See the abovementioned paper (Eq. 15).

best regards,

Johannes H. Proost
Dept. of Pharmacokinetics and Drug Delivery
University Centre for Pharmacy
Antonius Deusinglaan 1
9713 AV Groningen, The Netherlands

Email: j.h.proost.-at-.rug.nl

Back to the Top

On 12 May 2009 at 11:14:58, "G. Scott Lett, Ph.D." (slett.aaa.bioanalyticsgroup.com) sent the message


The following message was posted to: PharmPK

I've been following this discussion with interest.

It seems clear that posterior parameter estimates (mean, variance,
covariance) are possible, even straightforward.   Having come from the
aerospace industry, I've been surprised that methods such as Kalman
filters
(and their extensions) don't seem to be very commonplace in PK.

Regarding nonparametric methods discussed here; I love their
generality but
it seems that there are two barriers to using them for everyday work:

1.   Don't the nonparametric estimate require much more data? That is,
   If the parametric assumptions (approximately) hold, parametric
methods   require much less data.

2.  If the methods are not well known or accepted, then it may be a
challenge to get your work technically reviewed, published, or
accepted by
decision-makers (management, regulators).

Can you folks please comment these? I'd be interested in the current
thinking.

Thanks,
Scott

G. Scott Lett, Ph.D.
The BioAnalytics Group
241 Forsgate Dr.
Suite 209
Jamesburg, NJ 08831

Back to the Top

On 12 May 2009 at 18:25:48, "Roger Jelliffe" (jelliffe.aaa.usc.edu) sent the message


Dear Andreas:

           Thanks for your note and your comments. Yes. It is not only
a problem of knowing the errors in the Bayesian posterior parameter
estimates, it is also, as you say, the ability to predict, based on
the Bayesian posterior joint parameter density, the precision with
which a dosage regimen is likely to hit the target, and then,
specifically, to maximize that precision. That is what our approaches
do.

           For getting population parameter estimates from the
literature, you can do that, and use a Monte Carlo simulator to
generate fictive patient data files. You can also state the errors
with which doses are prepared and given, the errors in recording their
timing, the assay errors, and the timing errors of getting the
samples. From the resulting data files, you can then make an NPAG
population model, and then put this model in the clinical software to
use it. Michael Neely has done this for several HIV drugs.

           I am at a loss to understand why so many people continue to
use pop methods that are only approximate, not statistically
consistent, and which have no way to estimate the precision with which
a regimen will hit a target.

           Anyone who wishes to can go to our web site www.lapk.org,
click on software, and download our demonstration software to evaluate
it both for its ability to make a NP pop model, but also to plan,
monitor, and adjust maximally precise dosage regimens for patient care.

Very best regards,

Roger Jelliffe

Back to the Top

On 12 May 2009 at 20:27:37, "Roger Jelliffe" (jelliffe.-a-.usc.edu) sent the message


Dear Scott:

           Thanks for your very good comments. I am also surprised
that Kalman filtering is not more widely understood in the PK
community. We use an extended Kalman filter in our clinical software
for multiple model (MM) Bayesian adaptive control of drug dosage
regimens. It works very well where appropriate. We really do optimal
open-loop feedback stochastic adaptive control.

           As to the nonparametric (NP) methods, there are no real
barriers.

1.                Most modelers, though, really do not care about
math. They are accustomed to using structural models in a certain
software package. They almost never ask what such a package really
does for them. Some sort of results are usually enough. How the
results are obtained, they don't really care. If it gets means within
about 2%, that is fine. Variances and correlations are usually not
considered as to their reliability. Go to almost any modeling
workshop. This is usually the case.

2.                The NP methods do not require more data. Indeed they
work better than parametric methods, especially in data poor settings.
However, you are quite correct, that no methods do well with sparse or
uninformative data. You can see the results degrade as you withhold
data points from an analysis. However, you never see the actual
distributions with parametric methods - only the means, SD's, and
covariances. With the NP methods it is much easier, since you are
estimating the entire distribution, you can see the plots of the
estimated densities degrade as the data, and the experimental design,
become less informative. You never get away from the issues of proper
experimental design. You cannot make a silk purse from a sow's ear. I
guess with many policies for therapeutic drug monitoring (TDM) as they
are today, you only put lipstick on a pig. Also, with the NP approach,
there are NO outliers, as there is no assumed shape of the
distribution. This is very useful.

3.                Yes. It has been quite difficult to get our work
accepted by conventional statisticians who review many papers. It has
been extremely hard to put across the difference between Bayesian
forecasting (which usually does not specify a target, but shows you
what the regimen predicts), and really optimized Bayesian stochastic
adaptive CONTROL of a system, where a specific target is selected, and
the regimen to hit it most precisely is developed.

4.                We have talked with the FDA in many settings, but
the usual response there and almost everywhere else, has been
something like

"Yes, OK,  ............"   and then inertia, except in one instance.
Very disappointing. Clinical Pharmacology is dead. It used to be a
real clinical specialty. Then came PK and quantitative approaches, The
physicians of the time couldn't understand them. Now they are dead or
retired, and the specialty has become one oriented 99.44% about the
drug industry. Good clinical pharmacists do what they can. A few
physicians do what they can. But the practical, good and precise use
of drugs is hardly thought of, and is not taught effectively in any
medical school I have seen, except 1 or 2. NOT here, for example. Look
at the discussion we are having now. And you are the only one without
parametric statistical blinders! Good for you!

5.                The PK/PD community needs to see that their work is
not done in cultural isolation just for the drug companies. Models of
potentially toxic drugs need to be used in settings of optimal control
of PK/PD systems for maximally precise patient care. It is NOT enough
to go to meetings and talk to each other about what a great thing pop
modeling is. It is NOT enough to look up the literature about how
drugs age dosed today. We need real SOFTWARE TOOLS to do the job of
Bayesian adaptive control optimally. We need to reach the physicians
who strongly resist any applications to therapy which are more than
simple memorized rituals that some committee who never saw that
physician's patient has issued some guidelines for. Nobody knows his/
her patient better than the clinician at the bedside. Clinicians need
to be trained to set their own target goals for each patient according
to THAT patient's perceived need, to develop regimens which hit those
targets with minimum error, and to take the responsibility for what
they do. They need to take the responsibility as the patient's best
advocate, and not evade it by following some guideline set up by
people who never saw that patient, or knew the infecting bug, for
example.

6.                Look at the way the aerospace industry works. Look
what they do! Air travel is so safe now that it would be most
difficult to make ANY "statistically significant" reduction in the
number of air crashes.  But Rolls-Royce, I hear, monitor all their
engines in flight by satellite. If there is any trouble over the
Atlantic, for example, the pilots and the controllers hear it right
NOW, and can discuss it and what to do about it. This is clearly NOT
"cost-effective" in the medical culture.

7.                But did you know that an episode of grade 3-4 graft
versus host disease in a child with a bone marrow transplant costs an
extra 100,000 euros to treat, and costs only 30,000 euros extra to
monitor the child and avoid it? Nathalie Bleyzac writes poorly in
English as she is French, in Lyon, and so her work doesn't get
published well. But this is her work.  Sander Vinks and his group
showed in The Hague that model-based, goal-oriented TDM also improved
care and reduced hospital stay by 6 days. I think it is Athannios
Iliadis in Marseille who now has patients with testicular CA treated
with cisplatin, using Bayesian adaptive control, some of whom have 15
year follow-ups! When complications are reduced by good Bayesian
adaptive control, all the rest that goes with it also gets reduced.
The medical community today would NEVER spend the extra money to
monitor by satellite, but I sure feel better flying knowing that the
engines are looked after like that!

8.                The way things are now is that the pharmacy and the
laboratory see only the extra costs of monitoring and adjusting the
therapy with toxic drugs. They do NOT see the results of what they do.
I think the hospital administrators should see this, but what do they
know yet? They sure don't read this chatbox.

9.                Further, the medical schools never teach this way to
use drugs. They give lip service to it by teaching baby PK with linear
regression on logs of levels, but never anything clinically useful, in
any medical school I have visited worldwide. Especially anything
Bayesian is too hard" and "too much work for what it is worth". These
were the student comments I got when I was teaching such an elective
course for 3 and 4 year students. That is what my students said to the
curriculum committee. I have seen many student applicants to med
school who had good math backgrounds and who had the ability to
understand these approaches. When I would see them again in 3rd year,
all that ability was gone. It had been brainwashed out of them by the
medical culture of intuitive judgment and memorization. It was
dropped. No one taches decision theory in med school either. What a
tragedy! At least you can go to Tufts and get a fellowship in it. Good
for Tufts! Most of the good work is done by thoughtful pharmacists,
but most do not have the clinical training to really use their
clinical judgment, in addition to the analyses, to responsibly select
a target goal. Some do, most don't. Hardly any physician does, that's
for sure. This is a terrible indictment of the medical, and also some
of the pharmaceutical communities, as well as the drug industry, when
it comes to suggesting dosage regimens of toxic drugs. Marketing a
drug needing TDM is currently the kiss of death.

10.           Again, go to our web site www.lapk.org. Click around on
teaching topics, new advances, and software. Let's talk more.

All the best, and many thanks  again!

Roger Jelliffe

Back to the Top

Want to post a follow-up message on this topic?
If this link does not work with your browser send a follow-up message to PharmPK@boomer.org with "Standard errors of Bayesian parameter estimates" as the subject

PharmPK Discussion List Archive Index page

Support PharmPK by using the
link to buy books etc.
from Amazon.com