Bayes Theorem
Bayes theorem
Prior predictive distribution
Posterior predictive distribution
Fundamental Distributions
| Name | PDF/PMF | Mean | Variance | Mode |
|---|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Table 1: Single Variate DistributionsFunctions
Beta Function
Properties:
Conjugate Prior
The idea of conjugate prior is that for a give likelihood we choose a prior distribution such that, after observing data and applying Bayes’ theorem, the posterior distribution belongs to the same family as the prior.
That is, if and have the same distributional form, then the prior is called a conjugate prior for the likelihood model.
This is useful because it makes Bayesian updating analytically tractable. Instead of performing difficult integration or numerical approximation, we can often derive the posterior parameters in closed form.
Conjugate Prior for Exponential Families
Note general exponential family:
Likelihood of a sequence of i.i.d.samples:
So conjugate prior for that likelihood is
Posterior is
Proper and Improper Prior Distributions
A prior is called proper if it is a valid probability distribution:
And improper if
- If a prior is proper, so must the posterior.
- If a prior is improper, the posterior could be proper or improper.
In theory, all priors are acceptable, as long as the posterior is proper.
Fisher Information Matrix
Jeffreys’ Prior
Pivotal Quantities
For the binomial and other single-parameter models, different principles give (slightly) different noninformative prior distributions. But for two cases—location parameters and scale parameters—all principles seem to agree[1].
Location Parameter
Scale Parameter
Predictive Accuracy
People care about the accuracy in two different ways. First to assume that the model is all we known and check posterior predictions. The second is to compare several candidate models. Even if all of the models being considered have mismatches with the data, it can be informative to evaluate their predictive accuracy, compare them, and consider where to go next[2].
KL Divergence
Linear Algebra
Convex Combination
A subset of a vector space is said to be convex if for all vectors , and all scalars in .
Via induction, this can be seen to be equivalent to the requirement that for all vectors , and for all scalars such that .
Bibliography
- [1] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and others, Bayesian Data Analysis, Third. Boca Raton, Florida: Crc, 2013. [Online]. Available: https://stat.columbia.edu/~gelman/book/
- [2] A. Gelman, J. Hwang, and A. Vehtari, “Understanding predictive information criteria for Bayesian models,” Statistics and Computing, vol. 24, no. 6, pp. 997–1016, Nov. 2014, doi: 10.1007/s11222-013-9416-2.