Skip to content Skip to sidebar Skip to footer

Pymc With Observations On Multiple Variables

I'm using an example of linear regression from bayesian methods for hackers but having trouble expanding it to my usage. I have observations on a random variable, an assumed distr

Solution 1:

Data

First, note that the total likelihood of repeated Bernoulli trials is exactly a binomial likelihood, so there is no need to expand to individual trials in your data. I'd also suggest using a Pandas DataFrame to manage your data - it's helps to keep things tidy:

import pandas as pddf= pd.DataFrame({
    'n': [0, 5, 10, 15, 20, 25],
    'trials': [120, 111, 78, 144, 280, 55],
    'successes': [1, 2, 1, 3, 7, 1]
})

Solution

This will help simplify the model, but the solution really is to add a shape argument to the p random variable so that PyMC3 knows to how to interpret the one dimensional parameters. The fact is that you do want a different p distribution for each n case you have, so there is nothing conceptually wrong here.

with pm.Model() as model:
    # conversion rate hyperparameters
    alpha = pm.Uniform("alpha_n", 5, 13)
    beta = pm.Uniform("beta_n", 1000, 1400)

    # switchpoint prior
    n_sat = pm.Gamma("n_sat", alpha=20, beta=2, testval=10)

    a_gamma = pm.Gamma("a_gamma", alpha=18, beta=15)
    b_gamma = pm.Gamma("b_gamma", alpha=18, beta=27)

    # NB: I removed pm.Deterministic b/c (a|b)_slope[0] is constant 
    #     and this causes issues when using ArViZ
    a_slope = 1 + (df.n.values/n_sat)*(a_gamma-1)
    b_slope = 1 + (df.n.values/n_sat)*(b_gamma-1)

    a = pm.math.switch(df.n.values >= n_sat, a_gamma, a_slope)
    b = pm.math.switch(df.n.values >= n_sat, b_gamma, b_slope)

    # conversion rates
    p = pm.Beta("p", alpha=alpha*a, beta=beta*b, shape=len(df.n))

    # observations
    pm.Binomial("observed", n=df.trials, p=p, observed=df.successes)

    trace = pm.sample(5000, tune=10000)

This samples nicely

enter image description here

and yields reasonable intervals on the conversion rates

enter image description here

but the fact that the posteriors for alpha_n and beta_n go right up to your prior boundaries is a bit concerning:

enter image description here

I think the reason for this is that, for each condition you only do 55-280 trials, which, if the conditions were independent (worst case), conjugacy would tells us that your Beta hyperparameters should be in that range. Since you are doing a regression, then the best case scenario for information sharing across the trials would put your hyperparameters in the range of the sum of trials (788) - but that's an upper limit. Because you're outside this range, the concern here is that you're forcing the model to be more precise in its estimates than you really have the evidence to support. However, one can justify this is if the prior is based on strong independent evidence.

Otherwise, I'd suggest expanding the ranges on those priors that affect the final alpha*a and beta*b numbers (the sums of those should be close to your trial counts in the posterior).


Alternative Model

I'd probably do something along the following lines, which I think has a more transparent parameterization, though it's not completely identical to your model:

with pm.Model() as model_br_sp:
    # regression coefficients
    alpha = pm.Normal("alpha", mu=0, sd=1)
    beta = pm.Normal("beta", mu=0, sd=1)

    # saturation parameters
    saturation_point = pm.Gamma("saturation_point", alpha=20, beta=2)
    max_success_rate = pm.Beta("max_success_rate", 1, 9)

    # probability of conversion
    success_rate = pm.Deterministic("success_rate", 
                                    pm.math.switch(df.n.values > saturation_point, 
                                                   max_success_rate,
                                                   max_success_rate*pm.math.sigmoid(alpha + beta*df.n)))

    # observations
    pm.Binomial("successes", n=df.trials, p=success_rate, observed=df.successes)

    trace_br_sp = pm.sample(draws=5000, tune=10000)

Here we map the predictor space to probability space through a sigmoid that maxes out at the maximum success rate. The prior on the saturation point is identical to yours, while that on the maximum success rate is weakly informative (Beta[1,9] - though I will say it runs on a flat prior nearly as well). This also samples well,

enter image description here

and gives similar intervals (though the switchpoint seems to dominate more):

enter image description here

We can compare the two models and see that there isn't a significant difference in their explanatory power:

import arviz as azmodel_compare= az.compare({'Binomial Regression w/ Switchpoint': trace_br_sp,
                            'Original Model': trace})
az.plot_compare(model_compare)

enter image description here

enter image description here

Post a Comment for "Pymc With Observations On Multiple Variables"