Artificial Intelligence for the Potato Industry — Growers, Industry & Partners
Session 2: AI in Business | Topic: How data flows through a pipeline, where ML and GenAI plug in, and how results reach you
ETL stands for Extract, Transform, and Load. It’s a standard way to move data from where it’s collected into a form that’s useful for reporting, dashboards, or AI. Once the data is cleaned and structured, you can plug in both machine learning (ML) models and generative AI (GenAI) to get numbers, predictions, and human-readable text.
AI fits into this pipeline after (or during) the transform step: you feed the prepared data into ML models for predictions and into GenAI for summaries and alerts. Then you load both the raw outputs and the generated text to dashboards, apps, and SMS.
Machine learning (ML) uses patterns in your data to produce structured outputs: numbers (e.g. predicted yield in t/acre), categories (e.g. high/medium/low risk), or yes/no flags. ML models are trained on historical data and run on the same kind of structured data from your ETL pipeline. They don’t write sentences—they output columns you can chart, filter, and send to a dashboard or app.
Generative AI (GenAI) produces text (or images). In a pipeline, GenAI takes the same data—and often the ML outputs—and turns them into plain language: weekly summaries (“Field F102 is trending below target; consider extra scouting”), SMS alerts, answers to questions like “Why is field F101 predicted higher than last year?”, or short reports for management. GenAI doesn’t replace ML; it makes ML results easier to read and act on.
ML: predictions & scores → GenAI: summaries & alerts → Load: dashboard, app, SMS
Below is a single pipeline: data is extracted from several sources, transformed into one table, then fed to an ML model for yield and risk, and to GenAI for summaries and alerts. The results are loaded to a database and then surfaced on a dashboard, in an app, and via SMS.
Data is pulled from sources typical in potato operations:
Raw data is cleaned and combined into one table per field-season: same units, same date ranges, and no missing key values. Example shape after transform:
The transformed table is the input to an ML model. The model adds predicted yield and risk level per field. Those outputs (plus the underlying data) are then passed to GenAI to produce short summaries and alert text.
Example table after ML (two new columns):
GenAI takes this table (and optional context like “weekly update”) and generates, for example: “F101 and F103 on track. F102 below target (16.9 t/acre); medium risk—consider scouting.” That text is stored and also sent as SMS or in-app copy. The pipeline runs on a schedule (e.g. weekly) so predictions and messages stay current.
The final dataset—including ML predictions and GenAI text—is loaded into a database or data store. From there it powers a web dashboard, a mobile app, and SMS (or email). The next section shows example outputs for each.
The same pipeline can feed multiple surfaces. Below are mock examples of how the ML and GenAI outputs could look on a dashboard, in an app, and in an SMS.
A simple web dashboard might show key numbers and a chart. Tiles can come from the loaded ML results; the “Weekly summary” text is from GenAI.
F101 and F103 on track. F102 below target (16.9 t/acre); medium risk—consider scouting.
The same metrics and summary can appear in a mobile app. Alerts can be push notifications; the content can be the same GenAI summary or a shorter line.
SMS messages are kept short. GenAI can generate one or two lines from the ML results, e.g. for alerts or a daily/weekly digest.
Keeping ETL, ML, and GenAI in one pipeline makes it easier to add new data sources, new models, and new ways to consume the results. Good AI depends on good data; ETL forces you to define what you extract and how you transform it before adding models.
An ETL pipeline gives you a clear place to plug in both ML (for predictions and risk scores) and GenAI (for summaries and alerts). In a potato context, that means combining field, weather, and history into one dataset, running ML for yield and risk, then using GenAI to turn those results into plain-language updates. Loading once to a database lets you surface the same outputs on a dashboard, in an app, and via SMS so your team can act on them wherever they are.