4) which approach are you using? Generalized Poisson Estimator, or estimating the convexity effect of the exponential by looking at the sample variance of the log likelihood? The former is more pure, the latter may be more practical if ugly.
our first approach is the simplest: stochastic variational inference. consider a likelihood that factorizes over datapoints. stochastic variational inference then computes stochastic gradients of the variational objective function at each iteration by subsampling a "minibatch" of data at random.
i reckon the techniques you suggest would work as we move forward!
Edit: ah never mind, variational inference, got it! I was thinking stochastic HMC!
---
Ok but that will get an unbiased estimate of the log-likelihood. MCMC or HMC do work with noisy estimators, but they require unbiased estimates of the likelihood.
At the very least, you need to do a convexity adjustment by measuring the variance inside your mini batch. Or you can use the Poisson technique which will get you unbiased estimates of exp(x) from unbiased estimates of x (albeit at the cost of introducing a lot of variance).