What factors should be taken into account
when Establishing ML Monitoring?

blog-detail
social-iconsocial-iconsocial-iconsocial-icon
Artificial Intelligence
July 29, 2024 . 4 Min Read

During the configuration of ML monitoring for a specific model, it is crucial to consider the following technical aspects:

  • tickAligning the ML monitoring setup with the specific requirements of the use case.
  • tickEstablishing the cadence for model retraining in accordance with desired performance metrics.
  • tickDefining custom metrics designed to monitor specific aspects of model behavior for validation.
  • tickThoughtfully selecting a dataset that aligns with the intricacies of the monitored model

Aligning the ML monitoring setup with the specific requirements of the use case

In the establishment of an ML monitoring system, it is essential to tailor the intricacy of monitoring to the intricacies involved in deploying and operating the ML service. Various technical factors merit examination:

Implementation of ML Services: Evaluate the operational characteristics of the ML service, whether it functions as a real-time production service, employs frameworks like Kuberflow, Dagster, or Airflow workflows, or adopts an ad hoc Python script.

Feedback Loop: Both the feedback loop and the stability of the environment exert significant influence on the frequency of metric calculations and the selection of specific metrics for monitoring.

SLA Monitoring: Conduct an assessment of the business consequences resulting from drops in model quality and the associated risks to be monitored. Models with higher criticality may necessitate a more sophisticated monitoring setup.

blog-image

Establishing the cadence for model retraining in accordance with desired performance metrics.

Evaluate the optimal frequency and associated costs of model retraining, ensuring alignment with performance objectives.

Consider the method of retraining implementation, whether it involves real-time monitoring of metrics with a trigger-based retraining approach or the establishment of a predetermined retraining schedule.

Address issues related to the impediments of updating the model too frequently. This includes navigating complex approval processes, governance considerations, adherence to regulations, and the necessity for manual testing procedures.
sectionimage

Defining custom metrics designed to monitor specific aspects of model behavior for validation.

Deployment of metrics meticulously tailored to the unique characteristics of the use case, delivering a nuanced evaluation of model performance.

Formulation of heuristics acting as surrogate measures for model quality, particularly in scenarios where ground truth data is unattainable.

Incorporation of KPIs and KRIs into the monitoring framework, ensuring alignment with overarching objectives and potential risks inherent to the use case.

Deployment of tailored drift detection methodologies extending beyond conventional statistical tests, enabling a more comprehensive assessment of model behavior over time.

Development of a systematic approach to monitor fairness and explainability, ensuring these critical aspects are actively tracked and evaluated within the ML system.

Thoughtfully selecting a dataset that aligns with the intricacies of the monitored model.

In scenarios involving diverse production models utilizing distinct data types, such as numerical and categorical features, tabular data, time series, natural language, text, and images, the establishment of data quality monitoring may pose a formidable challenge. A more sophisticated approach involves leveraging a reference dataset to automatically generate diverse tests based on the provided examples and subsequently comparing new data batches against it. The strategic selection and curation of an appropriate reference dataset become imperative, akin to the importance placed on choosing the right metrics. An ideal reference dataset must accurately represent expected data patterns.

Furthermore, a reference dataset can serve as a baseline for conducting distribution drift comparisons, allowing for the consideration of fixed, moving, or multiple windows. The application of different reference datasets tailored to specific scenarios is a viable strategy, such as utilizing one dataset for distribution drift detection and another for generating conditions for data quality tests. This methodological approach ensures a more technical and nuanced handling of data quality monitoring in heterogeneous production models.

Recommended Articles

Article 0

Metricwise: The AI Observability Platform

At Metricwise, we're dedicated to making advanced AI tools accessible to everyone...

Learn more
Article 1

Monitoring V/S Observability

The terms monitoring and observability are prevalent in the field of AI/ML systems. Although they may ,,,

Learn more