GenoLearn Model Config

Users need to generate a model config before they can train their Machine Learning models. A model config describes the parameters for a Machine Learning model they would like to use. By default, all settings have been set to scikit-learn default values. If it is of interest to try out a few different values for the same parameter, the user can enter when prompted

param prompt [default value] : value1, value2, value3, ....

if they have specific values in mind or if their values are equidistant the user can make use of the built in range function

param prompt [default value] : range(start, end, step)

Upon executing the model-config command option, the user is prompted to configure one of two Machine Learning models

Genolearn ({VERSION}) Command Line Interface

GenoLearn is designed to enable researchers to perform Machine Learning on their genome
sequence data such as fsm-lite or unitig files.

See https://genolearn.readthedocs.io for documentation.

Working directory: {WORKING_DIRECTORY}

Command: model-config

Select a model to configure

0.  back                                goes to the previous command

1.  logistic_regression
2.  random_forest

See the scikit-learn documentation for Logistic Regression or Random Forest for more details.

Upon a successful execution, the an example directory tree will look like

working directory
├── data
│   ├── genome-sequence-data.txt.gz
│   └── metadata.csv
├── feature-selection
│   ├── default-fisher
│   └── default-fisher.log
├── meta
│   └── default
├── model
│   └── random-forest
└── preprocess
   ├── dense   [.npz files]
   ├── features.txt.gz
   ├── info.json
   ├── meta.json
   ├── preprocess.log
   └── sparse  [.npz files]