Data Ark: Defining Provenance Frameworks for AI Training Data
While models depend on large-scale training datasets, there is currently little to no scholarly framework for documenting where those datasets come from, how they are assembled, and how they circulate/degrade over time.