vbpca-py

Variational Bayesian PCA for incomplete data with native missing-data handling, uncertainty quantification, and C++-accelerated kernels. scikit-learn compatible.

vbpca-py implements variational Bayesian principal component analysis with first-class support for missing data. The algorithm jointly infers latent components, noise variance, and effective dimensionality while propagating uncertainty through the entire estimation pipeline.

Key features:

  • Native MCAR/MAR/MNAR missing-data handling without imputation
  • Automatic relevance determination for component selection
  • C++-accelerated dense and sparse masked kernels via pybind11
  • Full scikit-learn estimator API (fit, transform, score)
  • Uncertainty quantification on all latent quantities

Available on PyPI and archived at Zenodo.

The companion paper (Macdonald et al., 2024) develops the Bayesian rank estimation methodology using posterior predictive eigenvalue testing.

References

  1. arXiv
    Bayesian estimation of the number of significant principal components for cultural data
    Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, and 1 more author
    2024