Abstract:Current research on Device-Free Localization (DFL) primarily focuses on single-signal-source localization, which is susceptible to multipath propagation and environmental interference. To address the limitations of single-source localiza-tion methods in complex indoor environments, such as insufficient accuracy and stability, this paper proposes an indoor localization method based on multi-modal information fusion of Wi-Fi Channel State Information (CSI) fingerprint im-ages and ZigBee Received Signal Strength Indication (RSSI). First, Hampel filtering is applied to preprocess CSI signals, and both amplitude and phase information of CSI are combined to form high-resolution image fingerprint data. For RSSI signals, data packets collected by ZigBee sensor networks are processed through outlier removal and matrix transfor-mation to generate corresponding fingerprint data. Inspired by image classification tasks, a lightweight ECA-CNN net-work is designed to extract and train features from CSI fingerprint images, while a Transformer network is utilized to train RSSI fingerprint data. Finally, a soft voting method integrates the fingerprint databases from both models to produce classification outputs. Experimental results demonstrate that this method significantly improves localization accuracy and robustness in indoor environments, effectively overcoming the limitations of single-source localization.