Micro-module
Analyst ethics & data use
Educational scenarios for judgment practice. Not legal advice. Always follow data-provider terms, organizational policies, and applicable law when working with real feeds (e.g., Statcast-derived products, licensed databases).
Case A — Park adjustment headline
A blog headline reads: "New park factor proves Player X is actually 40% better than MVP candidates." The body cites a single-season expected-stat gap without showing the underlying model, sample size, or uncertainty. Your staff group chat wants to forward it to coaches tonight.
Reflection prompt
List three questions you would ask before treating the headline as decision-grade. Rewrite the headline in bounded, audit-friendly language.
Case B — Leaky "predict HR" notebook
A notebook predicts home-run probability with very high accuracy. On review, the feature set includes post-contact outcomes that are only known after the ball is struck. A junior analyst wants to ship the leaderboard to player development.
Reflection prompt
Explain the leakage in plain language. Describe the correct time-ordering for features vs. label. Propose one QA gate that would have caught this before publication.
Short rubric (self- or peer-review)
| Criterion | Excellent | Proficient | Developing | Needs support |
|---|
| Evidence & model spec | Names data source, target, features, and validation design; no hand-waving. | Identifies major spec elements; minor gaps only. | Vague on leakage or comparison baseline. | Headline disconnected from stated model. |
| Causal discipline | Separates association from causation; states what would be needed for causal claim. | Flags overclaim; suggests safer language. | Mixed causal language without correction. | Treats correlation as proof. |
| Professional communication | Clear recommendation scope + uncertainty + next diagnostic step. | Readable with bounded scope. | Jargon-heavy or unclear actions. | Misleading certainty or no action. |
Back to lesson library