You'll see data with no labels, no instructions, no hints. Your job: figure out what's "normal" and flag what doesn't fit. This is anomaly detection, a core unsupervised learning task.
A grid of shapes. Most follow a pattern. Some deviate. Click the ones that feel "off."
Employee records from a fake company. Some rows are anomalous. Find them in the numbers.
| # | Department | Tenure (yr) | Salary ($k) | Perf. Score | Sick Days | Projects |
|---|
Nobody told you what "normal" looked like. Nobody gave you labeled examples of anomalies. You looked at the data, built a mental model of the pattern, and flagged what deviated. That's exactly what an anomaly detection algorithm does: it learns the distribution of "normal" and flags points that fall outside it.
In supervised learning, you need someone to label every example as "normal" or "anomaly" first. Here, you figured it out from the structure of the data alone. The algorithm never sees a single label. It just learns what typical data looks like and raises a flag when something doesn't fit.
You might have noticed different things than the person next to you. Maybe you keyed in on color while they noticed size. Maybe you caught the salary outlier but missed the sick-days one. This is also true of algorithms: different approaches (isolation forests, autoencoders, clustering) surface different kinds of anomalies. There's no single "right" answer, just different models with different sensitivities and thresholds.