Story

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

ComputingArtificial Intelligence

Key takeaway

New AI algorithms can automatically discover structure in complex images without relying on predefined labels, which could improve scientific analysis across fields like biology and medicine.

Read the paper

Quick Explainer

The key idea is to extract a stable, reproducible structural product from image data first, based on explicit optimality criteria, before applying semantic interpretations. This decouples the core structural representation from domain ontologies that inevitably drift over time. The structural product can then be mapped to evolving vocabularies and ontologies as needed, enabling more robust, open-ended, and semantically pluralistic image analysis compared to traditional semantics-first approaches. This criteria-first, semantics-later pattern is emerging across diverse image-based scientific disciplines, shifting the focus of "theory" from implicit label systems to explicit structural optimality.

Deep Dive

Technical Deep Dive: Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

Overview

This work introduces a criteria-first, semantics-later approach to image analysis and structure discovery. The key idea is to decouple the extraction of a stable, reproducible structural product from the downstream application of semantic interpretations. This shifts the focus of "theory" from implicit label systems to explicit optimality criteria, enabling stable, transferable structural representations that can be mapped to evolving domain ontologies.

Problem & Context

Across image-based sciences, the dominant analytic paradigm remains semantics-first - mapping measurements directly to a predefined domain ontology or label set. This approach breaks down under conditions that make image-based science most valuable, including long-term monitoring, cross-sensor/site variability, and open-ended scientific discovery. Domain ontologies and associated label sets inevitably drift over time, making semantic commitments in the core analytic layer brittle.

Methodology

The proposed criteria-first, semantics-later approach involves two key steps:

Structural Product Extraction: A structural product S is extracted from the measurement field X using explicit, inspectable optimality criteria C. This structural product can take different forms (partitions, graphs, hierarchies, fields) depending on the domain and criterion.
Semantic Mapping: The structural product S is then mapped to one or more domain ontologies/vocabularies Oi via community-dependent semantic mappings Mi. These mappings can evolve independently without rewriting the core structural extraction.

The key principle is that structure precedes semantics - the structural product is defined by the measurement stream itself under declared criteria, rather than by a predefined domain ontology.

Results

The authors provide evidence of this criteria-first pattern emerging across diverse image-based disciplines, including:

Earth observation and environmental monitoring
Medical imaging
Microscopy and bioimaging
Seismology and geophysics
Astronomy
Materials science
Robotics and 3D sensing

In each domain, a structural product is first extracted using explicit optimality criteria, before downstream semantic interpretation is applied.

Interpretation

The criteria-first approach makes reproducibility operational, as the structural product can be independently reproduced given the same measurement data and declared criteria. It also enables robust domain transfer, as the same criteria-driven extraction can be applied across modalities. For long-term monitoring and digital twins, the stable structural product can serve as a durable, semantics-free representation, with evolving ontological mappings applied as needed.

Limitations & Uncertainties

The authors acknowledge that current practices and benchmarks often remain semantics-first, with the criteria-first structural layer implicit rather than formalized. Systematically exposing, validating, and sharing these structural products as reusable digital objects is an important next step.

What Comes Next

The authors outline a research agenda to:

Formalize families of scientifically meaningful, computationally tractable optimality criteria
Build benchmarks focused on structural product quality rather than semantic accuracy
Standardize the representation, versioning, and conformance of structural products as FAIR digital objects
Separate the governance of semantic mappings from the core structural extraction
Develop modular, domain-general tooling for criteria-first structure discovery

Overall, this work proposes a principled inversion that relocates "theory" to the upstream extraction of stable, transferable structure, enabling more robust, open-ended, and semantically pluralistic image analysis.

Source

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
PreprintarXiv (cs.AI)2/18/2026