Explorative Analysis#

Uses the prostate time series dataset provided by the SurvSet package.

%pip install njab openpyxl
Hide code cell output
Requirement already satisfied: njab in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (0.0.6)
Collecting openpyxl
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: omegaconf in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (2.3.0)
Requirement already satisfied: lifelines in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.28.0)
Requirement already satisfied: numpy in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (1.26.4)
Requirement already satisfied: pandas in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (2.2.2)
Requirement already satisfied: scikit-learn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (1.4.2)
Requirement already satisfied: statsmodels in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.14.2)
Requirement already satisfied: umap-learn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.5.6)
Requirement already satisfied: matplotlib in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (3.8.4)
Requirement already satisfied: mrmr-selection in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.2.8)
Requirement already satisfied: pingouin in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.5.4)
Requirement already satisfied: seaborn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.13.2)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl.metadata (1.8 kB)
Requirement already satisfied: scipy>=1.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.13.0)
Requirement already satisfied: autograd>=1.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.6.2)
Requirement already satisfied: autograd-gamma>=0.3 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (0.5.0)
Requirement already satisfied: formulaic>=0.2.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.0.1)
Requirement already satisfied: contourpy>=1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (24.0)
Requirement already satisfied: pillow>=8 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas->njab) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas->njab) (2024.1)
Requirement already satisfied: category-encoders in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (2.6.3)
Requirement already satisfied: jinja2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (3.1.4)
Requirement already satisfied: tqdm in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (4.66.4)
Requirement already satisfied: joblib in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (1.4.2)
Requirement already satisfied: polars>=0.12.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (0.20.25)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from omegaconf->njab) (4.9.3)
Requirement already satisfied: PyYAML>=5.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from omegaconf->njab) (6.0.1)
Requirement already satisfied: pandas-flavor in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pingouin->njab) (0.6.0)
Requirement already satisfied: tabulate in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pingouin->njab) (0.9.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from scikit-learn->njab) (3.5.0)
Requirement already satisfied: patsy>=0.5.6 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from statsmodels->njab) (0.5.6)
Requirement already satisfied: numba>=0.51.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from umap-learn->njab) (0.59.1)
Requirement already satisfied: pynndescent>=0.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from umap-learn->njab) (0.5.12)
Requirement already satisfied: future>=0.15.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from autograd>=1.5->lifelines->njab) (1.0.0)
Requirement already satisfied: interface-meta>=1.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (1.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (4.11.0)
Requirement already satisfied: wrapt>=1.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (1.16.0)
Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from numba>=0.51.2->umap-learn->njab) (0.42.0)
Requirement already satisfied: six in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from patsy>=0.5.6->statsmodels->njab) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from jinja2->mrmr-selection->njab) (2.1.5)
Requirement already satisfied: xarray in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas-flavor->pingouin->njab) (2024.5.0)
Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 250.0/250.0 kB 4.6 MB/s eta 0:00:00
?25hDownloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2
Note: you may need to restart the kernel to use updated packages.
Hide code cell source
from functools import partial
from pathlib import Path
import logging

from IPython.display import display

import numpy as np
import pandas as pd

import sklearn
import seaborn
from lifelines.plotting import add_at_risk_counts

import matplotlib.pyplot as plt

from njab.plotting.km import compare_km_curves, log_rank_test
import njab
import njab.plotting

njab.pandas.set_pandas_options()
njab.plotting.set_font_sizes('x-small')
seaborn.set_style("whitegrid")
plt.rcParams['figure.figsize'] = [4.0, 4.0]

Set parameters#

TARGET = 'event'
TIME_KM = 'time'
FOLDER = 'prostate'
CLINIC = 'https://raw.githubusercontent.com/ErikinBC/SurvSet/main/SurvSet/_datagen/output/prostate.csv'
val_ids: str = ''  # List of comma separated values or filepath
#
# list or string of csv, eg. "var1,var2"
clinic_cont = ['age']
# list or string of csv, eg. "var1,var2"
clinic_binary = ['male', 'AD']
# List of comma separated values or filepath
da_covar = 'num_age,num_wt'
Hide code cell source
print(f"Time To Event: {TIME_KM}")
print(f"event (target) variable: {TARGET}")
Time To Event: time
event (target) variable: event
Hide code cell content
FOLDER = Path(FOLDER)
FOLDER.mkdir(exist_ok=True, parents=True)
FOLDER
PosixPath('prostate')

Inspect the data:

Hide code cell source
clinic = pd.read_csv(CLINIC, index_col=0).dropna(how='any')
clinic.columns.name = 'feat_name'  # ! check needs to be implemented
cols_clinic = njab.pandas.get_colums_accessor(clinic)
clinic = clinic.astype({var: 'int'
                        for var in ['event',
                                    'time',
                                    'num_age',
                                    'num_wt',
                                    'num_sbp',
                                    'num_dbp',
                                    'num_sz',
                                    'num_sg',
                                    'num_sdate',
                                    'fac_stage']}
                       )
clinic
feat_name event time num_age num_wt num_sbp num_dbp num_hg num_sz num_sg num_ap num_sdate fac_stage fac_rx fac_pf fac_hx fac_ekg fac_bm
pid
0 0 72 75 76 15 9 13.799 2 8 0.300 2,778 3 0.2 mg estrogen normal activity N heart strain N
2 1 40 69 102 14 8 13.398 3 9 0.300 2,933 3 5.0 mg estrogen normal activity Y heart strain N
3 1 20 75 94 14 7 17.598 4 8 0.900 2,999 3 0.2 mg estrogen in bed < 50% daytime Y benign N
4 0 65 67 99 17 10 13.398 34 8 0.500 3,002 3 placebo normal activity N normal N
5 1 24 71 98 19 10 15.100 10 11 0.600 3,086 3 0.2 mg estrogen normal activity N normal N
6 1 46 75 100 14 10 13.000 13 9 0.800 3,099 3 placebo normal activity N benign N
7 0 62 73 114 17 11 12.600 3 9 0.600 3,100 3 placebo normal activity Y heart strain N
8 0 61 60 110 12 8 14.600 4 10 0.700 3,134 3 1.0 mg estrogen normal activity N normal N
9 0 60 78 107 13 8 13.000 21 6 0.400 3,140 3 1.0 mg estrogen normal activity Y old MI N
10 0 60 77 89 15 8 15.600 3 8 0.600 3,141 3 1.0 mg estrogen normal activity N normal N
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
491 0 51 76 114 30 18 11.600 6 9 0.500 3,440 3 1.0 mg estrogen normal activity Y heart strain N
492 0 51 68 98 14 8 13.398 6 9 0.700 3,440 3 5.0 mg estrogen normal activity N normal N
493 1 14 64 100 14 7 14.000 23 11 0.600 3,395 4 0.2 mg estrogen in bed < 50% daytime N normal N
494 1 27 75 102 16 8 14.199 5 11 29.598 3,430 4 0.2 mg estrogen normal activity N normal N
495 0 51 60 123 14 10 14.500 6 8 0.500 3,429 4 1.0 mg estrogen normal activity Y heart strain Y
496 1 5 73 100 19 10 16.797 2 11 2.900 3,433 4 5.0 mg estrogen normal activity Y normal N
497 1 0 78 108 11 6 15.799 9 13 0.600 3,297 3 1.0 mg estrogen normal activity N normal N
498 1 41 78 127 16 10 15.799 5 9 0.500 3,349 3 0.2 mg estrogen normal activity Y heart strain N
499 0 53 77 93 17 10 16.000 11 9 0.800 3,357 3 1.0 mg estrogen normal activity N heart block or conduction def N
500 1 19 82 96 12 6 12.398 32 13 8.398 3,311 4 5.0 mg estrogen normal activity Y heart strain N

483 rows × 17 columns

Descriptive statistics of non-numeric variables:

clinic.describe(include='object')
feat_name fac_rx fac_pf fac_hx fac_ekg fac_bm
count 483 483 483 483 483
unique 4 4 2 8 2
top placebo normal activity N normal N
freq 122 435 276 161 404

Set the binary variables and convert them to categories:

vars_binary = ['fac_hx', 'fac_bm']
clinic[vars_binary] = clinic[vars_binary].astype('category')

Covariates to adjust for:

Hide code cell source
check_isin_clinic = partial(njab.pandas.col_isin_df, df=clinic)
covar = check_isin_clinic(da_covar)
covar
['num_age', 'num_wt']

Set continous variables

vars_cont = [
    'num_age', 'num_wt', 'num_sbp', 'num_dbp', 'num_hg', 'num_sz', 'num_sg',
    'num_ap', 'num_sdate', 'fac_stage'
]

Collect outputs#

in an excel file:

Hide code cell source
fname = FOLDER / '1_differential_analysis.xlsx'
files_out = {fname.name: fname}
writer = pd.ExcelWriter(fname)
print(f"Output will be written to: {fname}")
Output will be written to: prostate/1_differential_analysis.xlsx

Differences between groups defined by target#

Perform an uncontrolled t-test for continous and binomal test for binary variables. Besides the test results, summary statistics of both groups are provided.

First, set the binary target as a boolean mask:

happend = clinic[TARGET].astype(bool)

Univariate t-test for continous variables#

Hide code cell source
ana_differential = njab.stats.groups_comparision.diff_analysis(
    clinic[vars_cont],
    happend,
    event_names=(TARGET, 'no event'),
)
ana_differential = ana_differential.sort_values(('ttest', 'p-val'))
ana_differential.to_excel(writer, "clinic continous", float_format='%.4f')
ana_differential
event no event ttest
count mean std count mean std alternative p-val cohen-d
variable
num_sz 344.000 15.779 13.124 139.000 10.986 9.033 two-sided 0.000 0.396
num_sg 344.000 10.503 2.026 139.000 9.791 1.894 two-sided 0.000 0.358
num_hg 344.000 13.259 2.020 139.000 13.826 1.629 two-sided 0.001 0.296
num_wt 344.000 98.023 13.783 139.000 101.259 12.003 two-sided 0.011 0.243
fac_stage 344.000 3.459 0.499 139.000 3.338 0.475 two-sided 0.013 0.246
num_age 344.000 72.029 7.004 139.000 70.424 6.775 two-sided 0.020 0.231
num_sdate 344.000 3,027.677 214.218 139.000 3,072.885 234.628 two-sided 0.051 0.205
num_sbp 344.000 14.410 2.419 139.000 14.252 2.497 two-sided 0.526 0.065
num_dbp 344.000 8.131 1.407 139.000 8.201 1.593 two-sided 0.649 0.048
num_ap 344.000 13.046 63.550 139.000 11.124 63.021 two-sided 0.762 0.030

Binomal test for binary variables:#

Hide code cell source
diff_binomial = []
for var in vars_binary:
    if len(clinic[var].cat.categories) == 2:
        diff_binomial.append(
            njab.stats.groups_comparision.binomtest(clinic[var],
                                                    happend,
                                                    event_names=(TARGET,
                                                                 'no-event')))
    else:
        logging.warning(
            f"Non-binary variable: {var} with {len(clinic[var].cat.categories)} categories"
        )

diff_binomial = pd.concat(diff_binomial).sort_values(
    ('binomial test', 'pvalue'))
diff_binomial.to_excel(writer, 'clinic binary', float_format='%.4f')
with pd.option_context('display.max_rows', len(diff_binomial)):
    display(diff_binomial)
event no-event binomial test
count p count p k n alternative statistic pvalue proportion_estimate
variable
fac_hx 344.000 0.491 139.000 0.273 169 344 two-sided 0.491 0.000 0.491
fac_bm 344.000 0.198 139.000 0.079 68 344 two-sided 0.198 0.000 0.198

Analaysis of covariance (ANCOVA)#

Next, we select continous variables controlling for covariates.

First the summary statistics for the target and covariates:

Hide code cell source
clinic_ancova = [TARGET, *covar]
clinic_ancova = clinic[clinic_ancova].copy()
clinic_ancova.describe(include='all')
feat_name event num_age num_wt
count 483.000 483.000 483.000
mean 0.712 71.567 98.954
std 0.453 6.970 13.364
min 0.000 48.000 69.000
25% 0.000 70.000 90.000
50% 1.000 73.000 98.000
75% 1.000 76.000 107.000
max 1.000 89.000 152.000

Discard all rows with a missing features (if present):

Hide code cell source
clinic_ancova = clinic_ancova.dropna(
)
categorical_columns = clinic_ancova.columns[clinic_ancova.dtypes == 'category']
print("Available covariates: " ", ".join(categorical_columns.to_list()))
for categorical_column in categorical_columns:
    # only works if no NA and only binary variables!
    clinic_ancova[categorical_column] = clinic_ancova[
        categorical_column].cat.codes

desc_ancova = clinic_ancova.describe()
desc_ancova.to_excel(writer, "covars", float_format='%.4f')
desc_ancova

feat_name event num_age num_wt
count 483.000 483.000 483.000
mean 0.712 71.567 98.954
std 0.453 6.970 13.364
min 0.000 48.000 69.000
25% 0.000 70.000 90.000
50% 1.000 73.000 98.000
75% 1.000 76.000 107.000
max 1.000 89.000 152.000

Remove non-varying variables (if present):

Hide code cell source
if (desc_ancova.loc['std'] < 0.001).sum():
    non_varying = desc_ancova.loc['std'] < 0.001
    non_varying = non_varying[non_varying].index
    print("Non varying columns: ", ', '.join(non_varying))
    clinic_ancova = clinic_ancova.drop(non_varying, axis=1)
    for col in non_varying:
        covar.remove(col)

Run ANCOVA:

Hide code cell source
ancova = njab.stats.ancova.AncovaOnlyTarget(
    df_proteomics=clinic[vars_cont].drop(covar, axis=1),
    df_clinic=clinic_ancova,
    target=TARGET,
    covar=covar,
    value_name='')
ancova = ancova.ancova().sort_values('p-unc')
ancova = ancova.loc[:, "p-unc":]
ancova.columns = pd.MultiIndex.from_product([['ancova'], ancova.columns],
                                            names=('test', 'var'))
ancova.to_excel(writer, "olink controlled", float_format='%.4f')
ancova.head(20)
test ancova
var p-unc np2 -Log10 pvalue qvalue rejected
feat_name Source
num_sz event 0.000 0.030 3.885 0.001 True
num_sg event 0.000 0.025 3.353 0.002 True
fac_stage event 0.017 0.012 1.776 0.045 True
num_hg event 0.025 0.010 1.608 0.049 True
num_sdate event 0.031 0.010 1.514 0.049 True
num_sbp event 0.348 0.002 0.459 0.464 False
num_ap event 0.772 0.000 0.113 0.839 False
num_dbp event 0.839 0.000 0.076 0.839 False
Hide code cell source
writer.close()

Kaplan-Meier (KM) plot for top markers#

Cutoff is defined using a univariate logistic regression

\[ \ln \frac{p}{1-p} = \beta_0 + \beta_1 \cdot x \]

the default cutoff p=0.5 corresponds to a feature value of:

\[ x = - \frac{\beta_0}{\beta_1} \]

Optional: The cutoff could be adapted to the prevalence of the target.

List of markers with significant difference between groups as defined by ANCOVA:

rejected = ancova.query("`('ancova', 'rejected')` == True")
rejected
test ancova
var p-unc np2 -Log10 pvalue qvalue rejected
feat_name Source
num_sz event 0.000 0.030 3.885 0.001 True
num_sg event 0.000 0.025 3.353 0.002 True
fac_stage event 0.017 0.012 1.776 0.045 True
num_hg event 0.025 0.010 1.608 0.049 True
num_sdate event 0.031 0.010 1.514 0.049 True

Settings for plots

class_weight = 'balanced'
y_km = clinic[TARGET]
time_km = clinic[TIME_KM]
compare_km_curves = partial(compare_km_curves,
                            time=time_km,
                            y=y_km,
                            xlim=(0, 80),
                            xlabel='time passed',
                            ylabel=f'rate {y_km.name}')
log_rank_test = partial(
    log_rank_test,
    time=time_km,
    y=y_km,
)
TOP_N = 2  # None = all

Plot KM curves for TOP_N (here two) makers with log-rank test p-value:

for marker, _ in rejected.index[:TOP_N]:  # first case done above currently
    fig, ax = plt.subplots()
    class_weight = 'balanced'
    # class_weight=None
    model = sklearn.linear_model.LogisticRegression(class_weight=class_weight)
    model = model.fit(X=clinic[marker].to_frame(), y=happend)
    print(
        f"Intercept {float(model.intercept_):5.3f}, coef.: {float(model.coef_):5.3f}"
    )
    # offset = np.log(p/(1-p)) # ! could be adapted based on proportion of target (for imbalanced data)
    offset = np.log(0.5 / (1 - 0.5))  # ! standard cutoff of probability of 0.5
    cutoff = offset - float(model.intercept_) / float(model.coef_)
    direction = '>' if model.coef_ > 0 else '<'
    print(
        f"Custom cutoff defined by Logistic regressor for {marker:>10}: {cutoff:.3f}"
    )
    pred = njab.sklearn.scoring.get_pred(model, clinic[marker].to_frame())
    ax, kmf_0, kmf_1 = compare_km_curves(pred=pred)
    res = log_rank_test(mask=pred)
    ax.set_title(
        f'KM curve for {TARGET.lower()}'
        f' and marker {marker} \n'
        f'(cutoff{direction}{cutoff:.2f}, log-rank-test p={res.p_value:.3f})')
    ax.legend([
        f"KP pred=0 (N={(~pred).sum()})", '95% CI (pred=0)',
        f"KP pred=1 (N={pred.sum()})", '95% CI (pred=1)'
    ])
    fname = FOLDER / f'KM_plot_{marker}.pdf'
    files_out[fname.name] = fname
    njab.plotting.savefig(ax.get_figure(), fname)

    # add counts
    add_at_risk_counts(kmf_0, kmf_1, ax=ax)
    fname = FOLDER / f'KM_plot_{marker}_w_counts.pdf'
    files_out[fname.name] = fname
    njab.plotting.savefig(ax.get_figure(), fname)
Intercept -0.517, coef.: 0.039
Custom cutoff defined by Logistic regressor for     num_sz: 13.135
Intercept -1.877, coef.: 0.185
Custom cutoff defined by Logistic regressor for     num_sg: 10.139
../_images/b1ccce0ab2de9d5a68460be47313090a8159c146dd0df8a9034a7d98c4f13082.png ../_images/1dee2684ce89158192fc475054fc189aaddae7d860de188a6aedfe5e57d363c1.png