# decompy `decompy` is a Python package containing several robust algorithms for matrix decomposition and analysis. The types of algorithms include * Robust PCA or SVD-based methods * Matrix completion methods * Robust matrix or tensor factorization methods. * Matrix rank estimation methods. ![](https://img.shields.io/pypi/pyversions/decompy) [![](https://img.shields.io/pypi/v/decompy)](https://pypi.org/project/decompy/) [![](https://img.shields.io/badge/license-BSD3_Clause-green)](https://github.com/subroy13/decompy/blob/master/LICENSE) ![PyPI - Downloads](https://img.shields.io/pypi/dm/decompy) ![](https://img.shields.io/pepy/dt/decompy) [![decompy](https://snyk.io/advisor/python/decompy/badge.svg)](https://snyk.io/advisor/python/decompy) [![Tests Status](tests-badge.svg?dummy=8484744)](./reports/junit/junit.xml) [![Test Coverage](coverage.svg?dummy=4345432)](.) [![Flake8](flake8-badge.svg?dummy=3436543)](./reports/flake8/index.html) ## Features - Data decomposition using various methods - Support for sparse decomposition, low-rank approximation, and more - User-friendly API for easy integration into your projects - Extensive documentation and examples ## Installation You can install `decompy` using pip: ```bash pip install decompy ``` ## Usage Here's a simple example demonstrating how to use decompy for data decomposition: ```python import numpy as np from decompy.matrix_factorization import RobustSVDDensityPowerDivergence # Load your data data = np.arange(100).reshape(20,5).astype(np.float64) # Perform data decomposition algo = RobustSVDDensityPowerDivergence(alpha = 0.5) result = algo.decompose(data) # Access the decomposed components U, V = result.singular_vectors(type = "both") S = result.singular_values() low_rank_component = U @ S @ V.T sparse_component = data - low_rank_component print(low_rank_component) print(sparse_component) ``` While the singular values are about 573 and 7.11 for this case (check the `S` variable), it can get highly affected if you use the simple SVD and change a single entry of the `data` matrix. ``` s2 = np.linalg.svd(data, compute_uv = False) print(np.round(s2, 2)) # estimated by usual SVD print(np.diag(np.round(S, 2))) # estimated by robust SVD data[1, 1] = 10000 # just change a single entry s3 = np.linalg.svd(data, compute_uv = False) print(np.round(s3, 2)) # usual SVD shoots up s4 = algo.decompose(data).singular_values() print(np.diag(np.round(s4, 2))) ``` You can find more example notebooks in the **examples** folder. For more detailed usage instructions, please refer to the **documentation**. ## Contributing Contributions are welcome! If you find any issues or have suggestions for improvements, please create an issue or submit a pull request on the GitHub repository. For contributing developers, please refer to [CONTRIBUTING.md](CONTRIBUTING.md) file. ## License This project is licensed under the [BSD 3-Clause License](LICENSE). ## List of Algorithms available in the `decompy` library ### Matrix Factorization Methods Currently, there are **19** matrix factorization methods available in `decompy`, as follows: * Alternating Direction Method [(Yuan and Yang, 2009)](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.400.8797) - `matrix_factorization/adm.py` * Augmented Lagrangian Method [(Tang and Nehorai)](https://ieeexplore.ieee.org/document/5766144) - `matrix_factorization/alm.py` * AS-RPCA: Active Subspace: Towards Scalable Low-Rank Learning [(Liu and Yan, 2012)](http://dl.acm.org/citation.cfm?id=2421487) - `matrix_factorization/asrpca.py` * Dual RPCA [(Lin et al. 2009)](http://arxiv.org/abs/1009.5055) - `matrix_factorization/dual.py` * Exact Augmented Lagrangian Method [(Lin, Chen and Ma, 2010)](https://arxiv.org/abs/1009.5055) - `matrix_factorization/ealm.py` * Robust PCA using Fast PCP Method [(Rodriguez and Wohlberg, 2013)](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6738015) - `matrix_factorization/fpcp.py` * Grassmann Average [(Hauberg et al. 2014)](http://files.is.tue.mpg.de/black/papers/RGA2014.pdf) [website](http://ps.is.tuebingen.mpg.de/project/Robust_PCA) - `matrix_factorization/ga.py` * Trimmed Grassmann Average [(Hauberg et al. 2014)](http://files.is.tue.mpg.de/black/papers/RGA2014.pdf) [website](http://ps.is.tuebingen.mpg.de/project/Robust_PCA) - `matrix_factorization/ga.py` (with `trim_percent` value more than 0) * Inexact Augmented Lagrangian Method [(Lin et al. 2009)](http://arxiv.org/abs/1009.5055) [website](http://perception.csl.illinois.edu/matrix-rank/sample_code.html) - `matrix_factorization/ialm.py` * L1 Filtering [(Liu et al. 2011)](http://arxiv.org/abs/1108.5359) - `matrix_factorization/l1f.py` * Linearized ADM with Adaptive Penalty [(Lin et al. 2011)](http://arxiv.org/abs/1109.0367) - `matrix_factorization/ladmap.py` * MoG-RPCA: Mixture of Gaussians RPCA [(Zhao et al. 2014)](http://jmlr.org/proceedings/papers/v32/zhao14.pdf) [website](http://www.cs.cmu.edu/~deyum/index.htm) - `matrix_factorization/mog.py` * Outlier Pursuit [Xu et al, 2011](https://guppy.mpe.nus.edu.sg/~mpexuh/papers/OutlierPursuit-TIT.pdf) - `matrix_factorization/op.py` * Principal Component Pursuit (PCP) Method [(Candes et al. 2009)](https://arxiv.org/abs/0912.3599) - `matrix_factorization/pcp.py` * RegL1-ALM: Robust low-rank matrix approximation with missing data and outliers [(Zheng et al. 2012)](https://sites.google.com/site/yinqiangzheng/home/zheng_CVPR12_robust%20L1-norm%20low-rank%20matrix%20factorization.pdf) [website](https://sites.google.com/site/yinqiangzheng/) - `matrix_factorization/regl1alm.py` * Robust SVD using Density Power Divergence (rSVDdpd) Algorithm [(Roy et al, 2023)](https://arxiv.org/abs/2109.10680) - `matrix_factorization/rsvddpd.py` * SVT: Singular Value Thresholding [(Cai et al. 2008)](http://arxiv.org/abs/0810.3286) [website](http://perception.csl.illinois.edu/matrix-rank/sample_code.html) - `matrix_factorization/svt.py` * Symmetric Alternating Direction Augmented Lagrangian Method (SADAL) [(Goldfarb et al. 2010)](http://arxiv.org/abs/0912.4571) - `matrix_factorization/sadal.py` * Robust PCA using Variational Bayes method [(Babacan et al 2012)](https://ieeexplore.ieee.org/document/6194350) - `matrix_factorization/vbrpca.py` ### Methods to be added (Coming soon) * R2PCP: Riemannian Robust Principal Component Pursuit [(Hintermüller and Wu, 2014)](http://link.springer.com/article/10.1007/s10851-014-0527-y) * DECOLOR: Contiguous Outliers in the Low-Rank Representation [(Zhou et al. 2011)](http://arxiv.org/abs/1109.0882) [website1](https://sites.google.com/site/eeyangc/software/decolor) [website2](https://fling.seas.upenn.edu/~xiaowz/dynamic/wordpress/?p=144) ## Rank Estimation Methods In `rankmethods/penalized.py` - * Elbow method * Akaike's Information Criterion (AIC) - https://link.springer.com/chapter/10.1007/978-1-4612-1694-0_15 * Bayesian Information Criterion (BIC) - https://doi.org/10.1214/aos/1176344136 * Bai and Ng's Information Criterion for spatiotemporal decomposition (PC1, PC2, PC3, IC1, IC2, IC3) - https://doi.org/10.1111/1468-0262.00273 * Divergence Information Criterion (DIC) - https://doi.org/10.1080/03610926.2017.1307405 In `rankmethods/cvrank.py` - * Gabriel style Cross validation - http://www.numdam.org/item/JSFS_2002__143_3-4_5_0/ * Wold style cross validation separate row and column deletion - https://www.jstor.org/stable/1267581 * Bi-cross validation (Owen and Perry) - https://doi.org/10.1214/08-AOAS227 In `rankmethods/bayes.py` - * Bayesian rank estimation method by Hoffman - https://www.jstor.org/stable/27639896