Logistic regression model#

Procedure:

Example: Alzheimers mass spectrometry-based proteomics dataset

Predict Alzheimer disease based on proteomics measurements.

# Setup colab installation
# You need to restart the runtime after running this cell
%pip install njab heatmapz openpyxl plotly

Show code cell output Hide code cell output

Requirement already satisfied: njab in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (0.0.6)
Collecting heatmapz
  Downloading heatmapz-0.0.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: openpyxl in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (3.1.2)
Collecting plotly
  Downloading plotly-5.22.0-py3-none-any.whl.metadata (7.1 kB)
Requirement already satisfied: omegaconf in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (2.3.0)
Requirement already satisfied: lifelines in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.28.0)
Requirement already satisfied: numpy in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (1.26.4)
Requirement already satisfied: pandas in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (2.2.2)
Requirement already satisfied: scikit-learn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (1.4.2)
Requirement already satisfied: statsmodels in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.14.2)
Requirement already satisfied: umap-learn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.5.6)
Requirement already satisfied: matplotlib in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (3.8.4)
Requirement already satisfied: mrmr-selection in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.2.8)
Requirement already satisfied: pingouin in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.5.4)
Requirement already satisfied: seaborn in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from njab) (0.13.2)
Requirement already satisfied: et-xmlfile in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from openpyxl) (1.1.0)
Collecting tenacity>=6.2.0 (from plotly)
  Downloading tenacity-8.3.0-py3-none-any.whl.metadata (1.2 kB)
Requirement already satisfied: packaging in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from plotly) (24.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (1.4.5)
Requirement already satisfied: pillow>=8 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from matplotlib->njab) (2.9.0.post0)
Requirement already satisfied: scipy>=1.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.13.0)
Requirement already satisfied: autograd>=1.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.6.2)
Requirement already satisfied: autograd-gamma>=0.3 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (0.5.0)
Requirement already satisfied: formulaic>=0.2.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from lifelines->njab) (1.0.1)
Requirement already satisfied: pytz>=2020.1 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas->njab) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas->njab) (2024.1)
Requirement already satisfied: category-encoders in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (2.6.3)
Requirement already satisfied: jinja2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (3.1.4)
Requirement already satisfied: tqdm in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (4.66.4)
Requirement already satisfied: joblib in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (1.4.2)
Requirement already satisfied: polars>=0.12.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from mrmr-selection->njab) (0.20.25)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from omegaconf->njab) (4.9.3)
Requirement already satisfied: PyYAML>=5.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from omegaconf->njab) (6.0.1)
Requirement already satisfied: pandas-flavor in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pingouin->njab) (0.6.0)
Requirement already satisfied: tabulate in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pingouin->njab) (0.9.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from scikit-learn->njab) (3.5.0)
Requirement already satisfied: patsy>=0.5.6 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from statsmodels->njab) (0.5.6)
Requirement already satisfied: numba>=0.51.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from umap-learn->njab) (0.59.1)
Requirement already satisfied: pynndescent>=0.5 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from umap-learn->njab) (0.5.12)
Requirement already satisfied: future>=0.15.2 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from autograd>=1.5->lifelines->njab) (1.0.0)
Requirement already satisfied: interface-meta>=1.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (1.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (4.11.0)
Requirement already satisfied: wrapt>=1.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from formulaic>=0.2.2->lifelines->njab) (1.16.0)
Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from numba>=0.51.2->umap-learn->njab) (0.42.0)
Requirement already satisfied: six in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from patsy>=0.5.6->statsmodels->njab) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from jinja2->mrmr-selection->njab) (2.1.5)
Requirement already satisfied: xarray in /home/docs/checkouts/readthedocs.org/user_builds/njab/envs/latest/lib/python3.10/site-packages (from pandas-flavor->pingouin->njab) (2024.5.0)
Downloading heatmapz-0.0.4-py3-none-any.whl (5.8 kB)
Downloading plotly-5.22.0-py3-none-any.whl (16.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.4/16.4 MB 70.8 MB/s eta 0:00:00
?25hDownloading tenacity-8.3.0-py3-none-any.whl (25 kB)
Installing collected packages: tenacity, plotly, heatmapz
Successfully installed heatmapz-0.0.4 plotly-5.22.0 tenacity-8.3.0
Note: you may need to restart the kernel to use updated packages.

Set parameters#

CLINIC: str = 'https://raw.githubusercontent.com/RasmussenLab/njab/HEAD/docs/tutorial/data/alzheimer/clinic_ml.csv'  # clincial data
fname_omics: str = 'https://raw.githubusercontent.com/RasmussenLab/njab/HEAD/docs/tutorial/data/alzheimer/proteome.csv'  # omics data
TARGET: str = 'AD'  # target column in CLINIC dataset (binary)
TARGET_LABEL: Optional[str] = None  # optional: rename target variable
n_features_max: int = 5
freq_cutoff: float = 0.5  # Omics cutoff for sample completeness
VAL_IDS: str = ''  #
VAL_IDS_query: str = ''
weights: bool = True
FOLDER = 'alzheimer'
model_name = 'all'

Setup#

Load data#

clinic = pd.read_csv(CLINIC, index_col=0).convert_dtypes()
cols_clinic = njab.pandas.get_colums_accessor(clinic)
omics = pd.read_csv(fname_omics, index_col=0)

Data shapes

omics.shape, clinic.shape

((210, 1542), (210, 6))

See how common omics features are and remove feature below choosen frequency cutoff

ax = omics.notna().sum().sort_values().plot(rot=45)

../_images/4ff988969c9543033e8c33cc3829e176d01b7100a47eec49306116e6847e689c.png

Removed 248 features with more than 50.0% missing values.
Remaining features: 1294 (of 1542)

	A0A024QZX5	A0A024R0T9	A0A024R3W6	A0A024R644	A0A075B6H9	A0A075B6I0	A0A075B6I1	A0A075B6I6	A0A075B6I9	A0A075B6J9	...	Q9Y653	Q9Y696	Q9Y6C2	Q9Y6N6	Q9Y6N7	Q9Y6R7	Q9Y6X5	Q9Y6Y8	Q9Y6Y9	S4R3U6
Sample ID
Sample_000	15.912	16.852	15.571	16.481	20.246	16.764	17.584	16.988	20.054	NaN	...	16.012	15.178	NaN	15.050	16.842	19.863	NaN	19.563	12.838	12.805
Sample_001	15.936	16.874	15.519	16.387	19.941	18.786	17.144	NaN	19.067	16.188	...	15.528	15.576	NaN	14.833	16.597	20.299	15.556	19.386	13.970	12.443
Sample_002	16.112	14.523	15.935	16.416	19.251	16.832	15.671	17.012	18.569	NaN	...	15.229	14.728	13.757	15.118	17.440	19.598	15.735	20.447	12.637	12.505
Sample_003	16.107	17.032	15.802	16.979	19.628	17.852	18.877	14.182	18.985	13.438	...	15.495	14.590	14.682	15.140	17.356	19.429	NaN	20.216	12.627	12.445
Sample_004	15.603	15.331	15.375	16.679	20.450	18.682	17.081	14.140	19.686	14.495	...	14.757	15.094	14.048	15.256	17.075	19.582	15.328	19.867	13.145	12.235
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Sample_205	15.682	16.886	14.910	16.482	17.705	17.039	NaN	16.413	19.102	16.064	...	15.236	15.684	14.236	15.415	17.551	17.922	16.340	19.928	12.930	11.803
Sample_206	15.798	17.554	15.600	15.938	18.155	18.152	16.503	16.860	18.538	15.288	...	15.422	16.106	NaN	15.345	17.084	18.708	14.249	19.433	NaN	NaN
Sample_207	15.740	16.877	15.469	16.898	18.636	17.950	16.321	16.401	18.849	17.580	...	15.808	16.098	14.403	15.715	16.586	18.725	16.138	19.599	13.637	11.174
Sample_208	15.477	16.779	14.995	16.132	14.908	17.530	NaN	16.119	18.368	15.202	...	15.157	16.712	NaN	14.640	16.533	19.411	15.807	19.545	13.216	NaN
Sample_209	15.727	17.261	15.175	16.235	17.893	17.744	16.371	15.780	18.806	16.532	...	15.237	15.652	15.211	14.205	16.749	19.275	15.732	19.577	11.043	11.792

210 rows × 1294 columns

Clinical data#

View clinical data

clinic

	Kiel	Magdeburg	Sweden	male	age	AD
Sample ID
Sample_000	0	0	1	0	71	0
Sample_001	0	0	1	1	77	1
Sample_002	0	0	1	1	75	1
Sample_003	0	0	1	0	72	1
Sample_004	0	0	1	0	63	1
...	...	...	...	...	...	...
Sample_205	0	0	0	0	69	1
Sample_206	0	0	0	1	73	0
Sample_207	0	0	0	0	71	0
Sample_208	0	0	0	1	83	0
Sample_209	0	0	0	0	63	0

210 rows × 6 columns

Target#

Tabulate target and check for missing values

njab.pandas.value_counts_with_margins(clinic[TARGET])

	counts	prop
AD
0	122	0.581
1	88	0.419

if TARGET_LABEL is None:
    TARGET_LABEL = TARGET
y = clinic[TARGET].rename(TARGET_LABEL).astype(int)
clinic_for_ml = clinic.drop(TARGET, axis=1)

Test IDs#

Select some test samples:

WARNING:root:Create train and test split.

['Sample_127',
 'Sample_164',
 'Sample_175',
 'Sample_048',
 'Sample_159',
 'Sample_141',
 'Sample_174',
 'Sample_145',
 'Sample_090',
 'Sample_191',
 'Sample_038',
 'Sample_009',
 'Sample_112',
 'Sample_096',
 'Sample_146',
 'Sample_135',
 'Sample_142',
 'Sample_205',
 'Sample_186',
 'Sample_095',
 'Sample_085',
 'Sample_011',
 'Sample_156',
 'Sample_153',
 'Sample_124',
 'Sample_194',
 'Sample_061',
 'Sample_079',
 'Sample_149',
 'Sample_179',
 'Sample_197',
 'Sample_125',
 'Sample_133',
 'Sample_099',
 'Sample_067',
 'Sample_202',
 'Sample_010',
 'Sample_171',
 'Sample_018',
 'Sample_060',
 'Sample_185',
 'Sample_173']

Combine clinical and olink data#

# in case you need to subselect
feat_to_consider = clinic_for_ml.columns.to_list()
feat_to_consider += omics.columns.to_list()
feat_to_consider

Show code cell output Hide code cell output

['Kiel',
 'Magdeburg',
 'Sweden',
 'male',
 'age',
 'A0A024QZX5',
 'A0A024R0T9',
 'A0A024R3W6',
 'A0A024R644',
 'A0A075B6H9',
 'A0A075B6I0',
 'A0A075B6I1',
 'A0A075B6I6',
 'A0A075B6I9',
 'A0A075B6J9',
 'A0A075B6K4',
 'A0A075B6K5',
 'A0A075B6P5',
 'A0A075B6Q5',
 'A0A075B6R2',
 'A0A075B6R9',
 'A0A075B6S2',
 'A0A075B6S5',
 'A0A075B6S9',
 'A0A075B7D0',
 'A0A087WSV8',
 'A0A087WSY5',
 'A0A087WSY6',
 'A0A087WSZ0',
 'A0A087WTA1',
 'A0A087WTA8',
 'A0A087WTK0',
 'A0A087WTP3',
 'A0A087WTT8',
 'A0A087WTY6',
 'A0A087WU43',
 'A0A087WUM0',
 'A0A087WUT8',
 'A0A087WV17',
 'A0A087WVV2',
 'A0A087WW87',
 'A0A087WWA5',
 'A0A087WWF1',
 'A0A087WWT2',
 'A0A087WX80',
 'A0A087WXE9',
 'A0A087WXM8',
 'A0A087WXW9',
 'A0A087WYK9',
 'A0A087WYL5',
 'A0A087WZ82',
 'A0A087WZM2',
 'A0A087WZR4',
 'A0A087X089',
 'A0A087X0D5',
 'A0A087X0K0',
 'A0A087X0M8',
 'A0A087X0S5',
 'A0A087X106',
 'A0A087X117',
 'A0A087X136',
 'A0A087X152',
 'A0A087X1G7',
 'A0A087X1J7',
 'A0A087X1T7',
 'A0A087X1V2',
 'A0A087X240',
 'A0A087X253',
 'A0A0A0MQU6',
 'A0A0A0MQY7',
 'A0A0A0MR25',
 'A0A0A0MRJ6',
 'A0A0A0MRJ7',
 'A0A0A0MRZ8',
 'A0A0A0MS09',
 'A0A0A0MS15',
 'A0A0A0MS20',
 'A0A0A0MS52',
 'A0A0A0MSC4',
 'A0A0A0MSQ0',
 'A0A0A0MSX3',
 'A0A0A0MT26',
 'A0A0A0MT32',
 'A0A0A0MT36',
 'A0A0A0MT66',
 'A0A0A0MT69',
 'A0A0A0MT71',
 'A0A0A0MTC8',
 'A0A0A0MTH0',
 'A0A0A0MTI5',
 'A0A0A0MTP9',
 'A0A0A0MTQ4',
 'A0A0A0MTS2',
 'A0A0A6YY99',
 'A0A0A6YYA0',
 'A0A0A6YYG9',
 'A0A0A6YYJ2',
 'A0A0B4J1V0',
 'A0A0B4J1V2',
 'A0A0B4J1V6',
 'A0A0B4J1Y8',
 'A0A0B4J1Y9',
 'A0A0B4J1Z1',
 'A0A0B4J231',
 'A0A0B4J259',
 'A0A0B4J2B5',
 'A0A0B4J2C3',
 'A0A0B4J2D9',
 'A0A0C4DFP6',
 'A0A0C4DFZ2',
 'A0A0C4DG90',
 'A0A0C4DGV8',
 'A0A0C4DGY8',
 'A0A0C4DH07',
 'A0A0C4DH24',
 'A0A0C4DH25',
 'A0A0C4DH33',
 'A0A0C4DH34',
 'A0A0C4DH35',
 'A0A0C4DH36',
 'A0A0C4DH38',
 'A0A0C4DH67',
 'A0A0C4DH73',
 'A0A0D9SEM5',
 'A0A0D9SEP4',
 'A0A0D9SF30',
 'A0A0D9SFP6',
 'A0A0G2JHA9',
 'A0A0G2JHN4',
 'A0A0G2JIW1',
 'A0A0G2JLB3',
 'A0A0G2JLQ0',
 'A0A0G2JLV7',
 'A0A0G2JMC9',
 'A0A0G2JPR0',
 'A0A0G2JQ91',
 'A0A0G2JRN3',
 'A0A0G2JRQ6',
 'A0A0G2JSC0',
 'A0A0J9YW40',
 'A0A0J9YX35',
 'A0A0J9YXX1',
 'A0A0J9YY99',
 'A0A0U1RQC5',
 'A0A0U1RQV3',
 'A0A0U1RR20',
 'A0A0U1RR32',
 'A0A140T8Z9',
 'A0A140T913',
 'A0A140T962',
 'A0A140T971',
 'A0A140TA33',
 'A0A182DWH7',
 'A0A182DWI4',
 'A0A1B0GTM3',
 'A0A1B0GUU9',
 'A0A1B0GV23',
 'A0A1B0GVB9',
 'A0A1B0GXF2',
 'A0A1W2PQ11',
 'A0A1W2PQB1',
 'A0A1W2PR05',
 'A0A1W2PRS4',
 'A0A1X7SBT7',
 'A0A286YEY1',
 'A0A286YEY5',
 'A0A286YFE3',
 'A0A286YFJ8',
 'A0AVL1',
 'A0M8Q6',
 'A1L4H1',
 'A2A2D0',
 'A2A2V1',
 'A2A3C1',
 'A2NJV5',
 'A5D6W6',
 'A6NC48',
 'A6NCE7',
 'A6NCT7',
 'A6NFX8',
 'A6NG10',
 'A6NGN9',
 'A6NL88',
 'A6NLU5',
 'A6NNI4',
 'A6XMH3',
 'A6XND0',
 'A8MU39',
 'A8MZH3',
 'A9UJN9',
 'B0QY80',
 'B0QYF8',
 'B0YIW2',
 'B1AHL2',
 'B1AJR6',
 'B1AKZ9',
 'B1AMW1',
 'B1AN15',
 'B1AN99',
 'B1AP13',
 'B3KTY4',
 'B4DGQ7',
 'B4DHN5',
 'B4DKD2',
 'B4DPQ0',
 'B4DYV8',
 'B4E1Z4',
 'B4E3Q4',
 'B5MBX2',
 'B5MCX6',
 'B7Z2R4',
 'B7Z4G8',
 'B7ZKJ8',
 'B7ZKR5',
 'B7ZLJ8',
 'B7ZM79',
 'B8ZZ19',
 'B8ZZE5',
 'C9IY66',
 'C9J0J0',
 'C9J1I0',
 'C9J2H1',
 'C9J4G9',
 'C9J6G4',
 'C9J712',
 'C9J8T4',
 'C9JE82',
 'C9JF17',
 'C9JFR7',
 'C9JHR8',
 'C9JIZ6',
 'C9JKT8',
 'C9JP35',
 'C9JPD0',
 'C9JYY6',
 'D3DWX8',
 'D3YTI2',
 'D6R934',
 'D6R938',
 'D6R956',
 'D6R960',
 'D6RAR4',
 'D6RCE0',
 'D6RD17',
 'D6REA1',
 'D6RER2',
 'D6RF86',
 'D6RGG3',
 'D6RGW2',
 'D6RH31',
 'D6RJG0',
 'E5RGY1',
 'E5RHU3',
 'E5RIP7',
 'E5RJR5',
 'E7EMB3',
 'E7EMS2',
 'E7EN28',
 'E7EN65',
 'E7END6',
 'E7EPS8',
 'E7EPV7',
 'E7EQ64',
 'E7EQB2',
 'E7EQM8',
 'E7ESX1',
 'E7ETH0',
 'E7EU04',
 'E7EUF1',
 'E7EWD3',
 'E7EX60',
 'E9PC84',
 'E9PCD7',
 'E9PEK4',
 'E9PEX6',
 'E9PF59',
 'E9PFD7',
 'E9PG71',
 'E9PGA6',
 'E9PGC5',
 'E9PHN6',
 'E9PHY0',
 'E9PIM6',
 'E9PJP6',
 'E9PK25',
 'E9PKE3',
 'E9PKP3',
 'E9PL83',
 'E9PLK3',
 'E9PMI0',
 'E9PNW4',
 'F5GY55',
 'F5GY80',
 'F5GY87',
 'F5GZK2',
 'F5GZN3',
 'F5GZS6',
 'F5H1U3',
 'F5H265',
 'F5H2B5',
 'F5H3H7',
 'F5H423',
 'F5H5D3',
 'F6S8M0',
 'F6SYF8',
 'F6SYP7',
 'F6VDH7',
 'F8VVB6',
 'F8W1B7',
 'F8W703',
 'F8W785',
 'F8W876',
 'F8W8W7',
 'F8W9L4',
 'F8WAD8',
 'F8WD41',
 'F8WDW9',
 'F8WE04',
 'G3V0E5',
 'G3V150',
 'G3V164',
 'G3V1Q7',
 'G3V2U7',
 'G3V2W1',
 'G3V3A0',
 'G3V3X5',
 'G3V4U0',
 'G3V533',
 'G3XAI2',
 'G3XAK1',
 'G3XAP6',
 'G5E968',
 'G5E9G7',
 'H0Y4H1',
 'H0Y512',
 'H0Y5E4',
 'H0YAC1',
 'H0YDE5',
 'H0YDJ3',
 'H0YEX9',
 'H0YI30',
 'H0YLF3',
 'H0YMB1',
 'H0YNK6',
 'H3BLU2',
 'H3BM42',
 'H3BMA1',
 'H3BP20',
 'H3BPK2',
 'H3BPK3',
 'H3BRV9',
 'H3BUV8',
 'H7BY64',
 'H7BYX6',
 'H7BZJ3',
 'H7BZT7',
 'H7C0V4',
 'H7C2F2',
 'H9KV31',
 'I3L0A0',
 'I3L0A1',
 'I3L0N3',
 'I3L145',
 'I3L397',
 'I3L3E6',
 'I3L3J8',
 'I3L3R5',
 'J3KNA1',
 'J3KNC5',
 'J3KNE3',
 'J3KNF6',
 'J3KNP4',
 'J3KNV4',
 'J3KQ18',
 'J3KQG3',
 'J3KRI5',
 'J3KSN0',
 'J3QQR8',
 'J3QS03',
 'K4DIA0',
 'K7EKE8',
 'K7EKL3',
 'K7ELW0',
 'K7ENA4',
 'K7ENE5',
 'K7ER15',
 'K7ERG9',
 'K7ERI9',
 'K7ES70',
 'M0QXF7',
 'M0QZI8',
 'M0R009',
 'M0R1Q1',
 'O00187',
 'O00241',
 'O00264',
 'O00339',
 'O00391',
 'O00451',
 'O00462',
 'O00468-6',
 'O00533',
 'O00754',
 'O14498',
 'O14594',
 'O14672',
 'O14773',
 'O14791',
 'O14793',
 'O14917',
 'O15031',
 'O15041',
 'O15197',
 'O15204',
 'O15230',
 'O15240',
 'O15335',
 'O15354',
 'O15466',
 'O43157',
 'O43173',
 'O43291',
 'O43300',
 'O43405',
 'O43505',
 'O43529',
 'O43827',
 'O43852',
 'O43854',
 'O43916',
 'O60242',
 'O60243',
 'O60245',
 'O60279',
 'O60476',
 'O60568',
 'O60814',
 'O60883',
 'O60888',
 'O75051',
 'O75063',
 'O75071',
 'O75083',
 'O75084',
 'O75173',
 'O75326',
 'O75368',
 'O75460',
 'O75493',
 'O75509',
 'O75636',
 'O75711',
 'O75752',
 'O75781',
 'O75874',
 'O75882',
 'O76061',
 'O76070',
 'O94760',
 'O94769',
 'O94772',
 'O94856',
 'O94910',
 'O94919',
 'O94933',
 'O94985',
 'O94985-2',
 'O94991',
 'O95158',
 'O95206',
 'O95428',
 'O95445',
 'O95450',
 'O95467',
 'O95479',
 'O95967',
 'O95998',
 'P00338',
 'P00390',
 'P00441',
 'P00450',
 'P00488',
 'P00491',
 'P00492',
 'P00505',
 'P00533',
 'P00558',
 'P00568',
 'P00734',
 'P00738',
 'P00739',
 'P00740',
 'P00742',
 'P00747',
 'P00748',
 'P01008',
 'P01009',
 'P01011',
 'P01019',
 'P01023',
 'P01024',
 'P01031',
 'P01033',
 'P01034',
 'P01036',
 'P01042',
 'P01042-2',
 'P01210',
 'P01213',
 'P01258',
 'P01282',
 'P01303',
 'P01344',
 'P01593',
 'P01597',
 'P01599',
 'P01602',
 'P01619',
 'P01624',
 'P01700',
 'P01701',
 'P01704',
 'P01714',
 'P01717',
 'P01721',
 'P01742',
 'P01743',
 'P01768',
 'P01780',
 'P01814',
 'P01817',
 'P01833',
 'P01834',
 'P01857',
 'P01859',
 'P01860',
 'P02100',
 'P02452',
 'P02458',
 'P02461',
 'P02533',
 'P02538',
 'P02647',
 'P02649',
 'P02652',
 'P02671',
 'P02675',
 'P02679',
 'P02741',
 'P02743',
 'P02745',
 'P02747',
 'P02748',
 'P02749',
 'P02750',
 'P02751',
 'P02753',
 'P02760',
 'P02763',
 'P02765',
 'P02766',
 'P02768',
 'P02774',
 'P02787',
 'P02790',
 'P02792',
 'P02794',
 'P03950',
 'P04003',
 'P04004',
 'P04066',
 'P04075',
 'P04080',
 'P04114',
 'P04179',
 'P04180',
 'P04196',
 'P04217-2',
 'P04264',
 'P04271',
 'P04275',
 'P04406',
 'P05023',
 'P05026',
 'P05060',
 'P05067',
 'P05109',
 'P05121',
 'P05154',
 'P05155',
 'P05160',
 'P05362',
 'P05408-2',
 'P05413',
 'P05452',
 'P05534',
 'P05543',
 'P05546',
 'P05937',
 'P05997',
 'P06276',
 'P06307',
 'P06310',
 'P06312',
 'P06331',
 'P06396',
 'P06396-2',
 'P06681',
 'P06702',
 'P06727',
 'P06732',
 'P06733',
 'P07093',
 'P07093-2',
 'P07195',
 'P07225',
 'P07237',
 'P07357',
 'P07360',
 'P07585',
 'P07686',
 'P07711',
 'P07737',
 'P07858',
 'P07900',
 'P07996',
 'P07998',
 'P08185',
 'P08253',
 'P08294',
 'P08493',
 'P08571',
 'P08572',
 'P08603',
 'P08670',
 'P08697',
 'P08758',
 'P09104',
 'P09172',
 'P09211',
 'P09382',
 'P09417',
 'P09486',
 'P09603',
 'P09619',
 'P09871',
 'P09960',
 'P09972',
 'P0C0L4',
 'P0C0L5',
 'P0C6S8',
 'P0DJI8',
 'P0DMQ5',
 'P0DOY3',
 'P10124',
 'P10153',
 'P10253',
 'P10451',
 'P10451-3',
 'P10451-5',
 'P10586',
 'P10599',
 'P10619',
 'P10636-2',
 'P10643',
 'P10644',
 'P10645',
 'P10721',
 'P10909',
 'P10909-3',
 'P11021',
 'P11047',
 'P11279',
 'P11362',
 'P11362-21',
 'P11597',
 'P11717',
 'P12107',
 'P12110',
 'P12111',
 'P12277',
 'P12318',
 'P12955',
 'P13473',
 'P13489',
 'P13521',
 'P13591',
 'P13611',
 'P13645',
 'P13647',
 'P13667',
 'P13671',
 'P13796',
 'P14138',
 'P14151',
 'P14174',
 'P14209',
 'P14314-2',
 'P14543',
 'P14618',
 'P14618-2',
 'P14621',
 'P14625',
 'P15086',
 'P15144',
 'P15151-2',
 'P15169',
 'P15291',
 'P15509',
 'P15531',
 'P16035',
 'P16083',
 'P16152',
 'P16519',
 'P16870',
 'P16930',
 'P17050',
 'P17174',
 'P17302',
 'P17405',
 'P17677',
 'P17900',
 'P17931',
 'P18065',
 'P18428',
 'P18669',
 'P19021',
 'P19022',
 'P19320',
 'P19367',
 'P19652',
 'P19823',
 'P19827',
 'P19961',
 'P20366',
 'P20382',
 'P20618',
 'P20742',
 'P20774',
 'P20827',
 'P20916',
 'P20933',
 'P21333',
 'P21579',
 'P21810',
 'P22303',
 'P22304',
 'P22676',
 'P22692',
 'P22748',
 'P22792',
 'P22891',
 'P22897',
 'P23083',
 'P23142',
 'P23284',
 'P23435',
 'P23468',
 'P23468-2',
 'P23470',
 'P23471',
 'P23515',
 'P23560',
 'P23582',
 'P24592',
 'P24593',
 'P25189',
 'P25311',
 'P25774',
 'P26038',
 'P26447',
 'P26572',
 'P26992',
 'P27169',
 'P27797',
 'P27824',
 'P29120',
 'P29144',
 'P29401',
 'P29622',
 'P29966',
 'P30041',
 'P30043',
 'P30086',
 'P30101',
 'P30530',
 'P30740',
 'P31146',
 'P31150',
 'P31321',
 'P31937',
 'P31944',
 'P31946',
 'P32004',
 'P32119',
 'P33151',
 'P33908',
 'P34059',
 'P34096',
 'P35052',
 'P35241',
 'P35442',
 'P35443',
 'P35527',
 'P35542',
 'P35555',
 'P35590',
 'P35749',
 'P35754',
 'P35858',
 'P35908',
 'P36222',
 'P36871',
 'P36955',
 'P37173',
 'P37802',
 'P37837',
 'P39060',
 'P40121',
 'P40189',
 'P40925',
 'P40926',
 'P41217',
 'P41222',
 'P42785',
 'P43121',
 'P43234',
 'P43251',
 'P43652',
 'P45877',
 'P47972',
 'P48163',
 'P48539',
 'P48637',
 'P48723',
 'P48740',
 'P48740-2',
 'P48745',
 'P49006',
 'P49257',
 'P49641',
 'P50395',
 'P51693-2',
 'P51884',
 'P51888',
 'P52758',
 'P52797-2',
 'P52799',
 'P52848',
 'P53634',
 'P54289',
 'P54289-2',
 'P54802',
 'P54803',
 'P55058',
 'P55087',
 'P55268',
 'P55283',
 'P55287',
 'P55290',
 'P55291',
 'P55774',
 'P56159',
 'P56817',
 'P58401',
 'P60174',
 'P60709',
 'P61088',
 'P61278',
 'P61769',
 'P61981',
 'P62258',
 'P62736',
 'P62805',
 'P62937',
 'P62942',
 'P62993',
 'P63104',
 'P63167',
 'P68036',
 'P68104',
 'P68371',
 'P68871',
 'P69905',
 'P78310',
 'P78324',
 'P78417',
 'P78509',
 'P78539',
 'P80108',
 'P80188',
 'P80723',
 'P80748',
 'P84157-2',
 'P98095',
 'P98160',
 'P98172',
 'Q00973',
 'Q01459',
 'Q01469',
 'Q01995',
 'Q02487',
 'Q02809',
 'Q02818',
 'Q03167',
 'Q03591',
 'Q04721',
 'Q04760',
 'Q05707',
 'Q05996',
 'Q06033',
 'Q06481',
 'Q06828',
 'Q06830',
 'Q07507',
 'Q07954',
 'Q08174-2',
 'Q08380',
 'Q08554',
 'Q08629',
 'Q08ET2',
 'Q09327',
 'Q09328',
 'Q0P6D2',
 'Q10469',
 'Q10471',
 'Q10472',
 'Q12794',
 'Q12797',
 'Q12841',
 'Q12860',
 'Q12866',
 'Q12907',
 'Q13217',
 'Q13228',
 'Q13231',
 'Q13332',
 'Q13332-5',
 'Q13433',
 'Q13444',
 'Q13508',
 'Q13554',
 'Q13641',
 'Q13740',
 'Q13790',
 'Q13867',
 'Q14019',
 'Q14112',
 'Q14118',
 'Q14126',
 'Q14152',
 'Q14165',
 'Q14257',
 'Q14314',
 'Q14315',
 'Q14393',
 'Q14515',
 'Q14520',
 'Q14571',
 'Q14574',
 'Q14624',
 'Q14696',
 'Q14697',
 'Q14766',
 'Q14894',
 'Q14956',
 'Q14982',
 'Q14C87',
 'Q14DG7',
 'Q15084',
 'Q15113',
 'Q15121',
 'Q15223',
 'Q15256',
 'Q15293',
 'Q15375',
 'Q15459',
 'Q15465',
 'Q15582',
 'Q15768',
 'Q15782',
 'Q15818',
 'Q15828',
 'Q15848',
 'Q15904',
 'Q16270',
 'Q16394',
 'Q16568',
 'Q16581',
 'Q16610',
 'Q16620',
 'Q16658',
 'Q16674',
 'Q16706',
 'Q16769',
 'Q16799',
 'Q16849',
 'Q16853',
 'Q24JP5',
 'Q2TAL6',
 ...]

View data for training

	Kiel	Magdeburg	Sweden	male	age	A0A024QZX5	A0A024R0T9	A0A024R3W6	A0A024R644	A0A075B6H9	...	Q9Y653	Q9Y696	Q9Y6C2	Q9Y6N6	Q9Y6N7	Q9Y6R7	Q9Y6X5	Q9Y6Y8	Q9Y6Y9	S4R3U6
Sample ID
Sample_000	0	0	1	0	71	15.912	16.852	15.571	16.481	20.246	...	16.012	15.178	NaN	15.050	16.842	19.863	NaN	19.563	12.838	12.805
Sample_001	0	0	1	1	77	15.936	16.874	15.519	16.387	19.941	...	15.528	15.576	NaN	14.833	16.597	20.299	15.556	19.386	13.970	12.443
Sample_002	0	0	1	1	75	16.112	14.523	15.935	16.416	19.251	...	15.229	14.728	13.757	15.118	17.440	19.598	15.735	20.447	12.637	12.505
Sample_003	0	0	1	0	72	16.107	17.032	15.802	16.979	19.628	...	15.495	14.590	14.682	15.140	17.356	19.429	NaN	20.216	12.627	12.445
Sample_004	0	0	1	0	63	15.603	15.331	15.375	16.679	20.450	...	14.757	15.094	14.048	15.256	17.075	19.582	15.328	19.867	13.145	12.235
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Sample_205	0	0	0	0	69	15.682	16.886	14.910	16.482	17.705	...	15.236	15.684	14.236	15.415	17.551	17.922	16.340	19.928	12.930	11.803
Sample_206	0	0	0	1	73	15.798	17.554	15.600	15.938	18.155	...	15.422	16.106	NaN	15.345	17.084	18.708	14.249	19.433	NaN	NaN
Sample_207	0	0	0	0	71	15.740	16.877	15.469	16.898	18.636	...	15.808	16.098	14.403	15.715	16.586	18.725	16.138	19.599	13.637	11.174
Sample_208	0	0	0	1	83	15.477	16.779	14.995	16.132	14.908	...	15.157	16.712	NaN	14.640	16.533	19.411	15.807	19.545	13.216	NaN
Sample_209	0	0	0	0	63	15.727	17.261	15.175	16.235	17.893	...	15.237	15.652	15.211	14.205	16.749	19.275	15.732	19.577	11.043	11.792

210 rows × 1299 columns

Data Splits#

Separate train and test split

Output folder#

Output folder: alzheimer

Outputs#

Save outputs to excel file:

Excel-file for tables: alzheimer/log_reg.xlsx

Collect test predictions#

predictions = y_val.to_frame('true')

Fill missing values with training median#

age          10
A0A024QZX5   11
A0A024R0T9    2
A0A024R3W6   23
A0A024R644    1
             ..
Q9Y6N6        6
Q9Y6N7       10
Q9Y6X5       22
Q9Y6Y9       70
S4R3U6       65
Length: 894, dtype: int64

	age	A0A024QZX5	A0A024R0T9	A0A024R3W6	A0A024R644	A0A075B6H9	A0A075B6I0	A0A075B6I1	A0A075B6I6	A0A075B6J9	...	Q9Y5I4	Q9Y617	Q9Y653	Q9Y696	Q9Y6C2	Q9Y6N6	Q9Y6N7	Q9Y6X5	Q9Y6Y9	S4R3U6
Sample ID
Sample_000	71	15.912	16.852	15.571	16.481	20.246	16.764	17.584	16.988	NaN	...	17.187	16.859	16.012	15.178	NaN	15.050	16.842	NaN	12.838	12.805
Sample_001	77	15.936	16.874	15.519	16.387	19.941	18.786	17.144	NaN	16.188	...	17.447	16.799	15.528	15.576	NaN	14.833	16.597	15.556	13.970	12.443
Sample_002	75	16.112	14.523	15.935	16.416	19.251	16.832	15.671	17.012	NaN	...	17.410	16.288	15.229	14.728	13.757	15.118	17.440	15.735	12.637	12.505
Sample_003	72	16.107	17.032	15.802	16.979	19.628	17.852	18.877	14.182	13.438	...	17.545	17.075	15.495	14.590	14.682	15.140	17.356	NaN	12.627	12.445
Sample_004	63	15.603	15.331	15.375	16.679	20.450	18.682	17.081	14.140	14.495	...	17.297	16.736	14.757	15.094	14.048	15.256	17.075	15.328	13.145	12.235
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Sample_204	<NA>	NaN	17.279	15.287	16.513	20.183	19.674	20.251	16.334	19.778	...	15.874	15.465	15.668	15.915	14.204	15.025	NaN	15.012	12.288	10.564
Sample_206	73	15.798	17.554	15.600	15.938	18.155	18.152	16.503	16.860	15.288	...	17.109	15.035	15.422	16.106	NaN	15.345	17.084	14.249	NaN	NaN
Sample_207	71	15.740	16.877	15.469	16.898	18.636	17.950	16.321	16.401	17.580	...	16.938	16.283	15.808	16.098	14.403	15.715	16.586	16.138	13.637	11.174
Sample_208	83	15.477	16.779	14.995	16.132	14.908	17.530	NaN	16.119	15.202	...	17.155	15.920	15.157	16.712	NaN	14.640	16.533	15.807	13.216	NaN
Sample_209	63	15.727	17.261	15.175	16.235	17.893	17.744	16.371	15.780	16.532	...	16.776	15.713	15.237	15.652	15.211	14.205	16.749	15.732	11.043	11.792

168 rows × 894 columns

Impute using median of training data

median_imputer = sklearn.impute.SimpleImputer(strategy='median')

X = njab.sklearn.transform_DataFrame(X, median_imputer.fit_transform)
X_val = njab.sklearn.transform_DataFrame(X_val, median_imputer.transform)
assert X.isna().sum().sum() == 0
X.shape, X_val.shape

((168, 1299), (42, 1299))

Principal Components#

on standard normalized training data:

INFO:njab.plotting:Saved Figures to alzheimer/var_explained_by_PCs.pdf

(168, 1299)

../_images/7c6f0da8970efabec1bc46c88ef25a14741b783484610a55b457a864c8083e37.png

Plot first 5 PCs with binary target label annotating each sample::

INFO:njab.plotting:Saved Figures to alzheimer/scatter_first_5PCs.pdf

../_images/be88c44a669147318b2dc370572ab9c30561f2cf53ac252fbb4732a848430298.png

UMAP#

of training data:

INFO:njab.plotting:Saved Figures to alzheimer/umap.pdf

../_images/ea3eaf907663ec9ce50679425e90d16abce3ba257384bc4ebd9825bd31351b74.png

Baseline Model - Logistic Regression#

Based on parameters, use weighting:

if weights:
    weights = 'balanced'
    cutoff = 0.5
else:
    cutoff = None
    weights = None