Story
Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
Key takeaway
New AI algorithms can automatically discover structure in complex images without relying on predefined labels, which could improve scientific analysis across fields like biology and medicine.
Quick Explainer
The key idea is to extract a stable, reproducible structural product from image data first, based on explicit optimality criteria, before applying semantic interpretations. This decouples the core structural representation from domain ontologies that inevitably drift over time. The structural product can then be mapped to evolving vocabularies and ontologies as needed, enabling more robust, open-ended, and semantically pluralistic image analysis compared to traditional semantics-first approaches. This criteria-first, semantics-later pattern is emerging across diverse image-based scientific disciplines, shifting the focus of "theory" from implicit label systems to explicit structural optimality.
Deep Dive
Technical Deep Dive: Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
Overview
This work introduces a criteria-first, semantics-later approach to image analysis and structure discovery. The key idea is to decouple the extraction of a stable, reproducible structural product from the downstream application of semantic interpretations. This shifts the focus of "theory" from implicit label systems to explicit optimality criteria, enabling stable, transferable structural representations that can be mapped to evolving domain ontologies.
Problem & Context
Across image-based sciences, the dominant analytic paradigm remains semantics-first - mapping measurements directly to a predefined domain ontology or label set. This approach breaks down under conditions that make image-based science most valuable, including long-term monitoring, cross-sensor/site variability, and open-ended scientific discovery. Domain ontologies and associated label sets inevitably drift over time, making semantic commitments in the core analytic layer brittle.
Methodology
The proposed criteria-first, semantics-later approach involves two key steps:
- Structural Product Extraction: A structural product S is extracted from the measurement field X using explicit, inspectable optimality criteria C. This structural product can take different forms (partitions, graphs, hierarchies, fields) depending on the domain and criterion.
- Semantic Mapping: The structural product S is then mapped to one or more domain ontologies/vocabularies Oi via community-dependent semantic mappings Mi. These mappings can evolve independently without rewriting the core structural extraction.
The key principle is that structure precedes semantics - the structural product is defined by the measurement stream itself under declared criteria, rather than by a predefined domain ontology.
Results
The authors provide evidence of this criteria-first pattern emerging across diverse image-based disciplines, including:
- Earth observation and environmental monitoring
- Medical imaging
- Microscopy and bioimaging
- Seismology and geophysics
- Astronomy
- Materials science
- Robotics and 3D sensing
In each domain, a structural product is first extracted using explicit optimality criteria, before downstream semantic interpretation is applied.
Interpretation
The criteria-first approach makes reproducibility operational, as the structural product can be independently reproduced given the same measurement data and declared criteria. It also enables robust domain transfer, as the same criteria-driven extraction can be applied across modalities. For long-term monitoring and digital twins, the stable structural product can serve as a durable, semantics-free representation, with evolving ontological mappings applied as needed.
Limitations & Uncertainties
The authors acknowledge that current practices and benchmarks often remain semantics-first, with the criteria-first structural layer implicit rather than formalized. Systematically exposing, validating, and sharing these structural products as reusable digital objects is an important next step.
What Comes Next
The authors outline a research agenda to:
- Formalize families of scientifically meaningful, computationally tractable optimality criteria
- Build benchmarks focused on structural product quality rather than semantic accuracy
- Standardize the representation, versioning, and conformance of structural products as FAIR digital objects
- Separate the governance of semantic mappings from the core structural extraction
- Develop modular, domain-general tooling for criteria-first structure discovery
Overall, this work proposes a principled inversion that relocates "theory" to the upstream extraction of stable, transferable structure, enabling more robust, open-ended, and semantically pluralistic image analysis.
