Establishing accurate point-to-point correspondences between non-rigid 3D shapes remains a critical challenge, particularly under non-isometric deformations and topological noise. Existing functional map pipelines suffer from ambiguities that geometric descriptors alone cannot resolve, and spatial inconsistencies inherent in the projection of truncated spectral bases to dense pointwise correspondences. In this paper, we introduce SGMatch, a learning-based framework for semantic-guided non-rigid shape matching. Specifically, we design a Semantic-Guided Local Cross-Attention module that integrates semantic features from vision foundation models into geometric descriptors while preserving local structural continuity. Furthermore, we introduce a regularization objective based on conditional flow matching, which supervises a time-varying velocity field to encourage spatial smoothness of the recovered correspondences. Experimental results on multiple benchmarks demonstrate that SGMatch achieves competitive performance across near-isometric settings and consistent improvements under non-isometric deformations and topological noise.
SGMatch addresses two fundamental challenges in functional map-based shape matching: descriptor ambiguity (e.g., left-right symmetry confusion) and spatial inconsistency in recovered point-wise correspondences. Our framework consists of three key components:
Overview of SGMatch. Given a pair of shapes, we extract geometric and semantic features, fuse them via SGLCA, estimate functional maps, and regularize correspondences through spectral heat diffusion and conditional flow matching.
Geometric branch: We adopt DiffusionNet to compute per-vertex geometric descriptors that are robust to variations in mesh resolution and sampling density.
Semantic branch: Following Diff3F, we render each shape from multiple viewpoints, extract dense semantic features using a frozen DINOv2 encoder, and back-project them onto the 3D surface. This provides vertex-level descriptors enriched with high-level semantic awareness that helps resolve symmetric ambiguities.
Geometric features alone produce ambiguous correspondences in symmetric regions (e.g., left-right limbs). Semantic features from vision foundation models provide complementary cues to resolve these ambiguities.
Rather than naively concatenating features, SGLCA uses a gating mechanism to adaptively modulate geometric features with semantic context. A lightweight MLP generates channel-wise gates from semantic features, amplifying or attenuating geometric channels based on semantic relevance.
Attention is then restricted to local mesh neighborhoods, where geometric queries attend to semantic keys/values. This avoids spurious global interactions while enabling geometric features to selectively incorporate semantically relevant information from their spatial vicinity.
Even with improved descriptors, spectral truncation still introduces spatial inconsistency in recovered correspondences. We address this with a novel regularization based on Conditional Flow Matching (CFM).
First, spectral heat diffusion smooths the fused features to suppress local noise. Then, we define a linear interpolation path between source and transported target features, and train a velocity field to match the target transport direction. This enforces that spatially adjacent vertices follow non-divergent transport paths, discouraging abrupt local transitions and promoting smooth correspondences. An importance-weighted objective with Charbonnier loss further improves robustness to outliers.
Visualization of the effect of conditional flow matching regularization. The flow-based regularization produces spatially smoother correspondences by suppressing local mismatches.
Mean geodesic error (×100) on FAUST, SCAPE (near-isometric) and SHREC'19 (cross-dataset generalization).
| Method | FAUST | SCAPE | SHREC'19 |
|---|---|---|---|
| Axiomatic Methods | |||
| ZoomOut | 6.1 | 7.5 | - |
| Smooth Shells | 2.5 | 4.2 | - |
| DiscreteOp | 5.6 | 13.1 | - |
| Supervised Methods | |||
| GeomFMaps | 2.6 | 3.0 | 7.9 |
| Unsupervised Methods | |||
| Deep Shell | 1.7 | 2.5 | 21.1 |
| AttnFMaps | 1.9 | 2.2 | 5.8 |
| ULRSSM | 1.6 | 1.9 | 4.6 |
| HybridFMap | 1.5 | 1.8 | 3.6 |
| DenoisFMap | 1.7 | 2.1 | 3.6 |
| DiffuMatch | 1.9 | 4.4 | 3.9 |
| DeepFAFM | 1.5 | 1.9 | 3.6 |
| Ours | 1.4 | 1.8 | 3.3 |
Mean geodesic error (×100) on SMAL and DT4D-H. Our method achieves the best overall performance on SMAL, surpassing the previous state-of-the-art by 24%.
| Method | SMAL | DT4D-H (inter) | DT4D-H (intra) |
|---|---|---|---|
| Axiomatic Methods | |||
| ZoomOut | 38.4 | 4.0 | 29.0 |
| Smooth Shells | 30.0 | 1.2 | 6.4 |
| Supervised Methods | |||
| GeomFMaps | 8.4 | 1.9 | 4.2 |
| Unsupervised Methods | |||
| AttnFMaps | 5.4 | 1.7 | 11.6 |
| ULRSSM | 3.9 | 0.9 | 4.1 |
| HybridFMap | 3.3 | 1.0 | 3.5 |
| DenoisFMap | 4.3 | 5.8 | 16.9 |
| DeepFAFM | 3.8 | 0.9 | 3.9 |
| Ours | 2.5 | 1.0 | 3.4 |
Mean geodesic error (×100) on TOPKIDS. Our method achieves a 34% improvement over the previous best.
| Method | TOPKIDS |
|---|---|
| Smooth Shells | 10.8 |
| Deep Shell | 13.7 |
| ULRSSM | 9.2 |
| HybridFMap | 5.0 |
| DeepFAFM | 6.2 |
| Ours | 3.3 |
Qualitative results on SMAL and DT4D-H. Comparison of our method against DeepFAFM and HybridFMap via texture transfer. Our method produces more accurate and coherent correspondences.
@article{ye2026sgmatch,
title = {SGMatch: Semantic-Guided Non-Rigid Shape Matching with Flow Regularization},
author = {Ye, Tianwei and Mei, Xiaoguang and Xia, Yifan and Fan, Fan and Huang, Jun and Ma, Jiayi},
journal = {arXiv preprint arXiv:2603.12937},
year = {2026}
}