DOME-ML (or simply DOME) is an acronym standing for Data, Optimization, Model and Evaluation in Machine Learning. DOME is a set of community-wide guidelines, recommendations and checklists spanning these four areas aiming to help establish standards of supervised machine learning validation in biology. The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.
What is the scope of the recommendations?
The recommendations cover four separate aspects covering the major areas of ML:
Data
Preprocessing data properly, and using it in a knowledgeable manner is the only way to obtain good generalization
Machine learning models analyse experimental biological data by extracting patterns. The extracted patterns can then be used to give biological insights on similar data that were not previously seen by the model. The degree to which a model retains its performance on new data is called generalization power. Building ML models that generalize well is the main challenge of these methodologies, otherwise the trained models cannot be reused. Preprocessing data properly, and using it in a knowledgeable manner is the only way to obtain good generalization. Some basic concerns to consider are:
Training, test and validation datasets are partially or completely overlapping. This includes both sequence/structure similarities and a set of experiments obtained in different conditions or batches (e.g. for next-generation sequencing data);
Training dataset is too small to capture the full complexity of the underlying distribution;
Validation and test datasets are too small to provide a stable estimate of the model’s generalization power;
Training, validation and test sets are not representative of the problem domain due to e.g. presence of high noise levels, imbalanced classes, large chunks of similar (redundant) data points;
Data is not released to the public.
Optimization
Problems associated with a poor choice of optimization strategy.
Optimization, also known as model training, refers to the process of changing values that constitute the model (parameters and hyper-parameters of the model) in a way that maximizes the model’s ability to solve a given problem. In this section, we will focus on problems associated with a poor choice of optimization strategy. Such problems may include:
Selecting a too powerful model, that may over-fit (known as high-variance [arXiv:1803.08823]);
Selecting a too simple model, that may under-fit (known as high-bias [arXiv:1803.08823]);
Parameters are optimized and/or features are selected on hold-out data used to evaluate performance only. This may be particularly hard to spot for meta-predictors;
Parameters, hyper-parameters and optimization protocol are not specified and/or files supporting their specification are not open-access or follow a standard widely adopted by the community (if any).
Model
Fallacies of important aspects of the ML models (black-box models where interpretability is required, dissemination level of the model components, computational requirements to execute trained models)
Good overall performance of the trained model and its ability to generalize well to unseen data are important factors that undoubtedly affect the applicability of any proposed ML research. However, a few other important aspects related to ML models must be kept in mind. These include the following fallacies:
Employing unexplainable models (blackbox) for areas where interpretability is required;
The various components of a model (source code, model files, parameter configurations, executables) are not made available to the public;
Computational requirements to execute the trained models (generate prediction based on new data) are impractical.
Evaluation
Valid assessment methodology for any final model with adequate and comprehensive measures.
In the implementation of a robust and trustworthy ML method, a comprehensive data description, a correct optimization protocol, and a clearly defined (and open-access) model are critical first steps; however, equally important is a valid assessment methodology for any final model. Here are a few possible risks related to model assessment and evaluation:
The selected performance measures are not adequate or comprehensive for the problem at hand;
Reported performances are highly unstable;
Obtained performances are not compared to similar studies, methods and community-agreed datasets; obtained performances are not compared to simpler baseline methods.