ViT-LPATA: A Vision Transformer Model for Autism De-tection in Children Using Facial Images
DOI:
CSTR:
Author:
Affiliation:

Hunan University of Technology

Clc Number:

Fund Project:

Hunan Provincial Natural Science Foundation (2025JJ70029)、Scientific Research Project of Hunan Education Department (23A0423)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To address the difficulty in recognizing subtle differences in facial biomarkers in children with autism, a Learnable Positional Encoding Enhancement (LPEE) module was combined with the Adaptive Token Aggregation (ATA) module. ViT-LPATA, a predictive model for autism, was proposed. The model leverages the LPEE module to dy-namically capture facial geometric deformation features and integrates the ATA module to enhance the feature representation capability of pathological regions, thereby establishing precise mappings of biomarker differences. Experiments on a publicly available autism facial dataset demonstrated that ViT-LPATA achieved optimal perfor-mance, with 99.2% accuracy and an AUC value of 0.940.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 09,2025
  • Revised:June 24,2025
  • Adopted:July 30,2025
  • Online:
  • Published:
Article QR Code