ICML 2026 · Spotlight

When Attributes Disagree:
Gradient Conflict in Image Aesthetic Assessment

AGREE — Attribute-guided Gradient Routing for Establishing agrEement

Ye Wang1· Maocai Dai1· Jiang Xie1· Xiuli Bi1· Fei Tao2· Xiao Li3· Hong Yu1
1Chongqing University of Posts and Telecommunications 2NewsBreak 3HK PolyU / Univ. of Oxford
Paper Code BibTeX
01 · Abstract

Hard samples in IAA share a common cause

Image Aesthetic Assessment (IAA) predicts an image's overall aesthetic score, yet aesthetics is shaped by multiple attributes whose relative importance varies across content and usage scenarios. Under end-to-end training with only overall-score supervision, attribute signals are blended, causing gradient conflict across samples dominated by different attributes — gradients partially cancel, and systematic biases persist across diverse IAA models. We propose AGREE, which learns attribute-specific subspaces and performs sample-wise gradient routing based on perturbation-estimated sensitivity. It further reduces feature coupling via semantic anchors and improves robustness through error-aware reweighting. AGREE is plug-and-play for existing end-to-end IAA models, consistently improving five benchmarks (AVA, LAPIS, AADB, TAD66K, PARA) across six representative baselines.

TL;DR

Shared-parameter IAA training induces gradient conflict across attribute-dominant sample subsets. AGREE decouples attribute branches and routes updates per sample — a plug-and-play module yielding up to +4.8% SRCC overall and +38.5% SRCC on hard samples.

  1. Optimization-oriented analysis showing that shared-parameter IAA training induces persistent, attribute-related biases consistent across multiple baselines.
  2. AGREE, a plug-and-play framework that adapts update allocation to sample-specific attribute relevance without additional supervision or architectural changes.
  3. Comprehensive experiments on 5 datasets × 6 baselines demonstrate consistent improvements, with the largest gains concentrated on previously hard-to-optimize samples.
02 · Motivation

Different models, the same mistakes

We inspected prediction errors of several mainstream IAA methods and found a striking phenomenon: while models achieve competitive average performance, they collectively fail on the same subset of hard samples — forming stable error clusters in the prediction-label space. This cross-model consistency suggests an optimization-level issue rather than random noise.

Normal vs Hard samples

(a) Baseline models (hollow) achieve competitive performance on normal samples but collectively fail on hard samples; AGREE (filled) substantially reduces these shared errors.

Gradient conflict illustration

(b) Gradient conflict: subset gradients point in opposing directions on coupled parameters, partially canceling — leaving under-represented samples persistently hard to optimize.

03 · Method

Four mechanisms to establish agreement

AGREE mitigates gradient conflict via four complementary mechanisms operating at the parameter, feature, sample, and loss levels respectively — all without modifying the baseline architecture.

AGREE Framework Overview

Overview of AGREE: a plug-and-play framework with four mechanisms — Attribute-aware Decoupling, Multimodal Fusion with semantic anchors, Sensitivity-guided Routing, and Error-aware Reweighting.

01 / Parameter

Attribute Decoupling

Decomposes the image into five attribute dimensions, modeled by independent adapter branches.

02 / Feature

Multimodal Fusion

Frozen LLaVA text anchors injected per branch provide semantic separation across attributes.

03 / Sample

Sensitivity Routing

Per-sample attribute sensitivity, computed offline via perturbation, guides feature fusion.

04 / Loss

Error-aware MSE

EMA-tracked errors dynamically upweight persistently hard samples during training.

04 · Results

State of the art across five benchmarks

+0%
SRCC · AVA
+0%
SRCC · LAPIS
+0%
SRCC · Hard
−0%
MAE · Hard
Method AVA LAPIS PARA AADB TAD66K
SRCCPLCC SRCCPLCC SRCCPLCC SRCCPLCC SRCCPLCC
TANet IJCAI'22.684.675.694.706.815.853.564.575.425.457
EAT MM'23.752.755.802.809.891.924.601.611.476.503
ELTA ICML'24.698.704.686.696.826.851.592.583.413.430
EAMB-Net TIM'24.702.707.823.825.853.876.638.640.427.428
HKD-IAA TMM'24.753.754.781.760.867.899.671.672.395.377
AesPrompt TMM'25.724.719.792.812.875.896.562.577.452.475
AGREE w/ EAT.789.791.853.853.911.940.669.678.488.511
AGREE w/ HKD-IAA.761.759.859.860.879.916.682.689.432.458
Δ Improvement↑4.8%↑4.8%↑4.4%↑4.2%↑2.2%↑1.7%↑1.6%↑2.5%↑2.5%↑1.6%

Bold = best, underline = second-best. AGREE integrated with two representative baselines (EAT and HKD-IAA) jointly achieves SOTA across all five benchmarks.

05 · Plug-and-Play

Consistent gains across six baselines

AGREE plugs into diverse IAA architectures, yielding average SRCC gains of 4.2%–12.6% across datasets and improving hard samples by up to 38.5%. Even strong baselines benefit substantially, confirming gains orthogonal to existing architectures.

Plug-and-play dumbbell results across four metrics

Dumbbell plots across four metrics: (a) SRCC ↑ · (b) PLCC ↑ · (c) MAE ↓ · (d) RMSE ↓. Each line connects baseline (hollow) to +AGREE (filled). overall · hard samples.

Interactive: explore improvements across datasets & metrics

Switch dataset and metric — each row is one baseline; the bar shows how far +AGREE moves it from the baseline value.

Overall set Hard subset ○ baseline   ● + AGREE
06 · Analysis

Gradient conflict, quantified and resolved

We quantitatively verify that AGREE eliminates inter-attribute gradient conflicts. Cosine similarities across attribute branches drop to near zero (|cos|<0.05), and the training-dynamic conflict ratio decreases by 24%, leading to smoother optimization and faster convergence.

Dominant-attribute marginalization analysis

Hard samples are dominated by under-represented attributes. (a) The dominant-attribute marginalization M is significantly larger on common-error samples (Cohen's d = 4.42). (b) Their M distribution shifts right and separates from normal samples. (c) M positively correlates with baseline error (Pearson r = 0.41).

Gradient conflict analysis

(a) Baseline EAT shows widespread negative cosine similarities between attribute pairs. (b) AGREE keeps off-diagonal entries near zero. (c) AGREE reduces the conflict ratio by 24% and stabilizes training dynamics.

07 · Ablation Playground

Toggle modules and see each contribution

AGREE comprises four mechanisms. Click the toggles below to enable/disable any combination — results recompute live from the paper's ablation table (AVA, EAMB-Net backbone).

Decoupling
Multimodal Fusion
Sensitivity Routing
Error-aware MSE
SRCC ↑
.715
+0.013
PLCC ↑
.719
+0.012
MAE ↓
.411
−0.009
RMSE ↓
.527
−0.013

Baseline EAMB-Net (no toggles): SRCC .702 / PLCC .707 / MAE .420 / RMSE .540. Δ shown relative to baseline.

08 · Citation

If our work helps yours

@inproceedings{wang2026agree,
  title     = {When Attributes Disagree: Gradient Conflict in Image Aesthetic Assessment},
  author    = {Wang, Ye and Dai, Maocai and Xie, Jiang and Bi, Xiuli and
               Tao, Fei and Li, Xiao and Yu, Hong},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2026}
}