http://research.microsoft.com/infernet/

It’s been used for datasets of up to 100M records by using the support for chunking up the data.

It offers variational EM and Expectation Propagation as its main inference algorithms (preliminary Gibbs support is in there as well). For a comparison to BUGS read this thread:

http://community.research.microsoft.com/forums/t/4823.aspx

We’re planning a new release shortly with a number of optimisations that should increase the size of dataset that can be used without chunking.

Hope that helps,

John Winn, Infer.NET team

With the dentistry data, I found it didn’t take a huge number of samples, but the posteriors were rather diffuse. Especially on the item-level difficulty parameters (in the beta-binomial by item or logistic models) and in the hierarchical parameters for annotators (beta-binomial by annotator and logistic models). I also found it hard to fit the multiplicative slope parameter from Qu et al. (also in the Uebersax and Grove models) — the posteriors were all over the place with very little difference in log likelihood.

Also, BUGS pitched a fit (throwing underflow/overflow exceptions) when I tried to swap in probit for logit. I’ve seen this mentioned before.

The other problem with BUGS and R is scaling — I’m about to create a 200K item data set using the mechanical Turk and I may need to swap over to something like Hal Daume’s hierarchical Bayesian compiler (HBC). Looking at the HBC doc, it doesn’t look like it’ll implement hierarchical generalized linear models like logistic or probit regression.

]]>I’ll have a look at the rest when I get the chance. The dentistry data isn’t too bad to fit, some of the other Qu et al data is harder to fit. The paper is also full of typo’s. An interesting question will be whether MCMC will require a massive number of samples to fully explore the posterior probability for some of the models.

]]>What’s the CVS project name? I couldn’t find it on the webpage listing all the project names, and it seems to be necessary to do a checkout.

]]>I might also be able to speed up the sampler by starting at the ML solution found by EM if it’s fast and robust.

I really need to handle the varying panel situation in which not every coder annotates every item.

Here’s the link to Ken Beath’s randomLCA package, which implements Qu, Tan and Kutner’s (1996) random effects model 2LCR.

http://cran.r-project.org/web/packages/randomLCA/index.html

As an aside, I find it confusing when doc has speculative comments about what might be coming next rather than sticking to what’s implemented.

]]>