vbpca-py | Joshua C. Macdonald

vbpca-py implements variational Bayesian principal component analysis with first-class support for missing data. The algorithm jointly infers latent components, noise variance, and effective dimensionality while propagating uncertainty through the entire estimation pipeline.

Key features:

Native MCAR/MAR/MNAR missing-data handling without imputation
Automatic relevance determination for component selection
C++-accelerated dense and sparse masked kernels via pybind11
Full scikit-learn estimator API (fit, transform, score)
Uncertainty quantification on all latent quantities

Available on PyPI and archived at Zenodo.

The companion paper (Macdonald et al., 2024) develops the Bayesian rank estimation methodology using posterior predictive eigenvalue testing.

References

arXiv

Bayesian estimation of the number of significant principal components for cultural data

Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, and 1 more author

2024

DOI Bib HTML

@article{macdonald2024vbpca,
  title = {{Bayesian} estimation of the number of significant principal components for cultural data},
  author = {Macdonald, Joshua C. and Blanco-Portillo, Javier and Feldman, Marcus W. and Ram, Yoav},
  year = {2024},
  archiveprefix = {arXiv},
  doi = {10.48550/arXiv.2409.12129},
}