(not) Just Another Biomarker (nJAB)#
Not Just Another Biomarkers (nJAB) is a Python package for the analysis of clinical and omics data for the discovery of biomarkers. It’s a collection of functionality used two papers.
njab
is a collection of some python function building on top of
pandas
, scikit-learn
, statsmodels
, pingoin
, numpy
and more…
It aims to formalize a procedure for biomarker discovery which was first developed for a paper on alcohol-related liver disease, based on mass spectrometry-based proteomics measurements of blood plasma samples:
Niu, L., Thiele, M., Geyer, P. E., Rasmussen, D. N., Webel, H. E., Santos, A., Gupta, R., Meier, F., Strauss, M., Kjaergaard, M., Lindvig, K., Jacobsen, S., Rasmussen, S., Hansen, T., Krag, A., & Mann, M. (2022). “Noninvasive Proteomic Biomarkers for Alcohol-Related Liver Disease.” Nature Medicine 28 (6): 1277–87. nature.com/articles/s41591-022-01850-y
The approach was formalized for an analysis of inflammation markers of a cohort of patients with alcohol related cirrhosis, based on OLink-based proteomics measurments of blood plasma samples:
Mynster Kronborg, T., Webel, H., O’Connell, M. B., Danielsen, K. V., Hobolth, L., Møller, S., Jensen, R. T., Bendtsen, F., Hansen, T., Rasmussen, S., Juel, H. B., & Kimer, N. (2023). Markers of inflammation predict survival in newly diagnosed cirrhosis: a prospective registry study. Scientific Reports, 13(1), 1–11. nature.com/articles/s41598-023-47384-2
Installation#
Install using pip from PyPi version.
pip install njab
or directly from github
pip install git+https://github.com/RasmussenLab/njab.git
Tutorials#
The tutorial can be found on the documentation of the project with output or can be run directly in colab.
Explorative Analysis of survival dataset#
The tutorial builds on a dataset example of survival of prostatic cancer.
The main steps in the tutorial are:
Data loading and inspection
Uncontrolled binary and t-tests for binary and continous variables respectively
ANCOVA analysis controlling for age and weight, corrected for multiple testing
Kaplan-Meier plots of for significant features
Biomarker discovery tutrial#
All steps are describe in the tutorial, where you could load your own data with minor adaptions. The tutorial build on an curated Alzheimer dataset from omiclearn. See the Alzheimer Data section for more information.
The main steps in the tutorial are:
Load and prepare data for machine learning
Find a good set of features using cross validation
Evaluate and inspect your model retrained on the entire training data
You can also find an executed version on this website, see links below.
- Explorative Analysis
- Logistic regression model
- Set parameters
- Setup
- Clinical data
- Target
- Test IDs
- Combine clinical and olink data
- Data Splits
- Output folder
- Collect test predictions
- Fill missing values with training median
- Principal Components
- UMAP
- Baseline Model - Logistic Regression
- Logistic Regression
- Receiver Operating Curve of final model
- Precision-Recall Curve for final model
- Coefficients with/out std. errors
- Selected Features
- Plot training data scores
- Test data scores
- Performance evaluations
- Multiplicative decompositon
- Plot TP, TN, FP and FN on PCA plot
- Savee annotation of errors for manuel analysis
- Outputs
Python Package reference#
You can browse the documentation for each module of the package, including the API reference and the source code.