AI Operations

Monitoring and model change control

Treat model updates as controlled product changes, not routine API tweaks.

Workbook: 30 minutes

In plain language

AI models should not be treated like static features that are finished once they launch. They behave more like living components inside the product. Their performance can shift when data changes, workflow conditions change, user behavior changes, or deployment environments differ. That is why monitoring and model change control are essential. Without them, an apparently successful AI feature can gradually become less reliable without the team noticing quickly enough.

For non-technical founders, the main idea is that model updates are product changes, not casual tweaks. If a model influences care, triage, documentation, or decision support, then changing it may affect performance, workflow burden, and risk. That deserves structured review.

What this page helps you decide

This page helps teams plan what happens after an AI feature leaves the demo environment. Model behavior can drift because data, users, sites, devices, labels, or workflows change over time.

Use it when a product roadmap includes retraining, performance monitoring, subgroup analysis, rollback, PCCP assumptions, or postmarket learning.

Operational metrics

Monitoring is how the team sees whether a model is still behaving as expected after launch. The goal is not only to watch technical performance. It is also to watch real workflow impact. A model can still produce outputs while quietly creating overload, bias, or poor user trust.

These metrics matter because they help distinguish between a model that is technically available and a model that is operationally healthy. Availability alone is not enough if outputs are degrading or creating poor workflow consequences.

Why change control matters

Many teams are used to ordinary software updates, but model changes can be trickier. A new model version may alter recommendations, change confidence behavior, or perform differently across sites and populations. If those changes are not reviewed carefully, the organization can introduce risk while believing it is making an improvement.

That is why model changes should be handled like controlled product changes. The team should know what changed, why it changed, what evidence supports the update, what risks were reviewed, and whether re-validation is needed. Fast iteration is useful, but in medtech it has to be paired with traceability and discipline.

Model release checklist

Change workflow

Controlled update workflow for model and software changes.

A strong workflow usually includes proposal, review, testing, approval, deployment planning, and post-release monitoring. Even when the model update is small, someone should still be able to explain what was updated and why the team believes the change is safe enough to release.

Concrete advice for non-technical founders

Ask for a simple policy on when a model change is minor and when it is major. What kinds of updates require deeper review, re-validation, or new user communication? What metrics trigger investigation? What thresholds trigger rollback? If those rules are not written down, the team may be relying too much on informal judgment.

A practical founder question is: if model performance drifted at one site next month, how would we know, who would decide what to do, and how quickly could we respond? If the team cannot answer that clearly, monitoring is probably not mature enough yet.

PCCP, updates, and licensing

If your product includes an authorized Predetermined Change Control Plan, model and software updates should be executed inside the approved modification protocol and performance envelope. Changes outside those bounds may be treated as significant and trigger a new submission or licence amendment. Even without a PCCP, treat each model release as a design change: update risk files, verification evidence, and release records.

Post-market monitoring (PMM)

Health Canada’s MLMD guidance highlights ongoing performance, inter-compatibility, and surveillance. Founders should expect dashboards not only for accuracy but for calibration drift, subgroup performance, data quality, alert burden, and human override rates—signals that safety and workflow are still acceptable.

Real-world data and incidents

Complaints, near misses, and safety signals should feed back into retraining decisions and labeling. Document how real-world evidence triggers investigation versus routine tuning.

Official references

Curriculum page last reviewed: 2026-04-22.

Summaries are for learning only; change reporting obligations depend on jurisdiction and device class.

Practical next step

Define the metrics that would trigger review, rollback, retraining, or customer communication for one AI feature.

Previous: AI/ML in medical productsNext: Interoperability standards