Microorganisms are classified based on their optimal growth pH ranges
The final one-hot encoding matrix for the presence or absence of whole genes
| Optimum pH |
Bacteria |
Gene a |
Gene b |
... |
Gene 20794 |
| 7.25 |
Genome 1 |
0 |
1 |
. |
1 |
| 8.90 |
Genome 2 |
1 |
0 |
. |
0 |
| 3.40 |
Genome 3 |
1 |
0 |
. |
1 |
| ... |
... |
. |
. |
. |
. |
| 5.75 |
Genome 3476 |
0 |
1 |
. |
1 |
Performance of the XGBoost model on the test set
| Model evaluation |
Value |
| MAE test above avg |
0.477490 |
| MSE test above avg |
0.443375 |
| RMSE test above avg |
0.665864 |
| R² test above avg |
0.350277 |
Performance of the XGBoost model on the validation set
| Model evaluation |
Value |
| MAE validation above avg |
0.492234 |
| MSE validation above avg |
0.481866 |
| RMSE validation above avg |
0.694166 |
| R² validation above avg |
0.416336 |
Calculate the results at different precisions using accuracy as the indicator.
| Allowable error |
Accuracy on the test set |
Accuracy on the validation set |
| 0.5 pH units |
0.654952 |
0.632184 |
| 0.6 pH units |
0.728435 |
0.724138 |
| 0.7 pH units |
0.779553 |
0.781609 |
| 0.8 pH units |
0.837061 |
0.830460 |
| 0.9 pH units |
0.869010 |
0.867816 |
| 1.0 pH units |
0.888179 |
0.893678 |
| 2.0 pH units |
0.984026 |
0.979885 |
SHAP analysis of the top 20 genes with the greatest impact
SHAP analysis of the top 20 genes among the 5485 feature genes showed that most of the genes with significant impacts are key genes reported in literature to be involved in the physiological mechanisms of microbial responses to changes in the external pH environment.
For instance, the Na_Ala_symp gene belongs to the sodium/alanine symporter family, and the protein encoded by this gene transports alanine through binding to sodium ions. The MgtE gene is classified as a transmembrane Mg²⁺ transporter, which can transport Mg²⁺ or other divalent cations into cells.
Modules used for project completion
| Name |
Version |
Build |
Channel |
| _libgcc_mutex |
0.1 |
conda_forge |
conda-forge |
| _openmp_mutex |
4.5 |
2_gnu |
conda-forge |
| _py-xgboost-mutex |
2.0 |
cpu_0 |
conda-forge |
| bzip2 |
1.0.8 |
hd590300_5 |
conda-forge |
| ca-certificates |
2024.2.2 |
hbcca054_0 |
conda-forge |
| joblib |
1.3.2 |
pyhd8ed1ab_0 |
conda-forge |
| ld_impl_linux-64 |
2.40 |
h41732ed_0 |
conda-forge |
| libblas |
3.9.0 |
21_linux64_openblas |
conda-forge |
| libcblas |
3.9.0 |
21_linux64_openblas |
conda-forge |
| libffi |
3.4.2 |
h7f98852_5 |
conda-forge |
| libgcc-ng |
13.2.0 |
h807b86a_5 |
conda-forge |
| libgfortran-ng |
13.2.0 |
h69a702a_5 |
conda-forge |
| libgfortran5 |
13.2.0 |
ha4646dd_5 |
conda-forge |
| libgomp |
13.2.0 |
h807b86a_5 |
conda-forge |
| liblapack |
3.9.0 |
21_linux64_openblas |
conda-forge |
| libnsl |
2.0.1 |
hd590300_0 |
conda-forge |
| libopenblas |
0.3.26 |
pthreads_h413a1c8_0 |
conda-forge |
| libsqlite |
3.45.1 |
h2797004_0 |
conda-forge |
| libstdcxx-ng |
13.2.0 |
h7e041cc_5 |
conda-forge |
| libuuid |
2.38.1 |
h0b41bf4_0 |
conda-forge |
| libxcrypt |
4.4.36 |
hd590300_1 |
conda-forge |
| libxgboost |
2.0.3 |
cpu_h6728c87_1 |
conda-forge |
| libzlib |
1.2.13 |
hd590300_5 |
conda-forge |
| ncurses |
6.4 |
h59595ed_2 |
conda-forge |
| numpy |
1.26.4 |
py310hb13e2d6_0 |
conda-forge |
| openssl |
3.2.1 |
hd590300_0 |
conda-forge |
| pandas |
2.2.1 |
py310hcc13569_0 |
conda-forge |
| pip |
24.0 |
pyhd8ed1ab_0 |
conda-forge |
| py-xgboost |
2.0.3 |
cpu_pyh0a621ce_1 |
conda-forge |
| python |
3.10.13 |
hd12c33a_1_cpython |
conda-forge |
| python-dateutil |
2.8.2 |
pyhd8ed1ab_0 |
conda-forge |
| python-tzdata |
2024.1 |
pyhd8ed1ab_0 |
conda-forge |
| python_abi |
3.10 |
4_cp310 |
conda-forge |
| pytz |
2024.1 |
pyhd8ed1ab_0 |
conda-forge |
| readline |
8.2 |
h8228510_1 |
conda-forge |
| scikit-learn |
1.4.1.post1 |
py310h1fdf081_0 |
conda-forge |
| scipy |
1.12.0 |
py310hb13e2d6_2 |
conda-forge |
| setuptools |
69.1.1 |
pyhd8ed1ab_0 |
conda-forge |
| six |
1.16.0 |
pyh6c4a22f_0 |
conda-forge |
| threadpoolctl |
3.3.0 |
pyhc1e730c_0 |
conda-forge |
| tk |
8.6.13 |
noxft_h4845f30_101 |
conda-forge |
| tzdata |
2024a |
h0c530f3_0 |
conda-forge |
| wheel |
0.42.0 |
pyhd8ed1ab_0 |
conda-forge |
| xgboost |
2.0.3 |
cpu_pyhb06c54e_1 |
conda-forge |
| xz |
5.2.6 |
h166bdaf_0 |
conda-forge |