Abstract:Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel. Relying solely on manual supervision can lead to inadequate oversight. In response to the demand for technology to identify improper operations in substation work scenarios, this paper proposes a substation safety action recognition technology to avoid the misoperation and enhance the safety management. In general, this paper utilizes a dual-branch transformer network to extract spatial and temporal information from the video dataset of operational behaviors in complex substation environments. Firstly, in order to capture the spatial-temporal correlation of people’s behaviors in smart grid substation, we devise a sparse attention module and a segmented linear attention module that are embedded into spatial branch transformer and temporal branch transformer respectively. To avoid the redundancy of spatial and temporal information, we fuse the temporal and spatial features using a tensor decomposition fusion module by a decoupled manner. Experimental results indicate that our proposed method accurately detects improper operational behaviors in substation work scenarios, outperforming other existing methods in terms of detection and recognition accuracy.