Loop 014
The production data cleanup loop
A production-data quality workflow that removes disallowed records, improves classification logic, and verifies the remaining dataset against an explicit definition.
Ready-to-use prompt
Copy the loop
Review production records, remove anything that does not meet the allowed definition, improve the classification logic, and verify the remaining data.
Verify / stop
Every remaining record meets the allowed definition.
Representative classification tests and a post-cleanup audit prove the retained data is valid.
Context and guidance When to use it, steps, safety notes, and related loops
Use this when
Use this when a production dataset contains records that no longer match a product, policy, taxonomy, or quality definition and the classifier allowed them through.
How to run it
- Write the allowed definition as explicit inclusion, exclusion, and edge-case rules before changing data.
- Audit production records, preserve a recoverable record of proposed removals, and separate clear violations from uncertain cases.
- Remove confirmed invalid records through the approved production path and improve the classifier with regression examples.
- Rerun classification tests and audit the remaining production data until every sampled and queried record meets the definition.
Why it works
Fixing both the existing records and the classifier closes the immediate data problem and reduces recurrence. Explicit rules and regression examples make future cleanup decisions reviewable.
Implementation note
Follow access, retention, privacy, and audit requirements. Use backups or reversible operations where appropriate, and do not delete uncertain records without review.