ViT-LPATA: A Vision Transformer Model for Autism De-tection in Children Using Facial Images

ViT-LPATA: A Vision Transformer Model for Autism De-tection in Children Using Facial Images
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:Hunan University of Technology
Clc Number:
Fund Project:Hunan Provincial Natural Science Foundation (2025JJ70029)、Scientific Research Project of Hunan Education Department (23A0423)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To address the difficulty in recognizing subtle differences in facial biomarkers in children with autism, a Learnable Positional Encoding Enhancement (LPEE) module was combined with the Adaptive Token Aggregation (ATA) module. ViT-LPATA, a predictive model for autism, was proposed. The model leverages the LPEE module to dy-namically capture facial geometric deformation features and integrates the ATA module to enhance the feature representation capability of pathological regions, thereby establishing precise mappings of biomarker differences. Experiments on a publicly available autism facial dataset demonstrated that ViT-LPATA achieved optimal perfor-mance, with 99.2% accuracy and an AUC value of 0.940.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 09,2025
Revised:June 24,2025
Adopted:July 30,2025
Online:
Published:

Home

About us

Authors

Editors

News

Contents

Contact us

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code