Loading...

Table of Content

    25 December 2022, Volume 50 Issue 12
    2022, 50(12):  0-0. 
    Asbtract ( 33 )   PDF (420KB) ( 204 )  
    Related Articles | Metrics
    Computer Science & Technology
    XING Xiaofen, LI Minsheng, XU Xiangmin
    2022, 50(12):  1-12.  doi:10.12141/j.issn.1000-565X.220165
    Asbtract ( 1072 )   HTML ( 59)   PDF (6379KB) ( 83 )  
    Figures and Tables | References | Related Articles | Metrics

    Common image sentiment transformation methods are based on the assumption that transferring image color can transfer image sentiment. However, due to the influence of image content, transferring image color cannot completely transfer image sentiment, and it is necessary to obtain a suitable reference image before transferring image color. However, in practical application, there will be difficulties in obtaining reference images that are similar to the target image in sentiments and similar to the source image in content, and the semantic consistency of local objects need to be considered when transferring image color. Therefore, this paper proposed an image sentiment transformation method based on adaptive brightness adjustment. According to the significant correlation between image brightness and image sentiment (also known as Valence value, abbreviated as V value) in psychology, the method adaptively adjusts brightness through deep neural network ISTNet to convert the image to target image sentiment. First, an image and its corresponding true V value were obtained from the existing image emotion dataset. By changing the image brightness, a series of images with different brightness can be obtained. Then, the pseudo V values corresponding to the images with the same content but different brightness were predicted by the pre-trained image V value regression. Finally, ISTNet was trained with these images and pseudo V values to learn the internal relationship between image brightness adjustment and sentimental change. In practical application, without any reference image, directly input the image and the target V value into the neural network ISTNet to obtain the output image of the corresponding sentimental tag. The experimental results show that the performance of this method is better than the existing color based image sentiment transformation methods.

    LU Yiqin, PAN Zhoushuang, ZHANG Yang, et al
    2022, 50(12):  13-19.  doi:10.12141/j.issn.1000-565X.220384
    Asbtract ( 1658 )   HTML ( 20)   PDF (1179KB) ( 362 )  
    Figures and Tables | References | Related Articles | Metrics

    Knowledge graph provides underlying support for many intelligent information service applications, including intelligent search, public safety, finance, medical care and other fields. However, the existing knowledge graph is usually incomplete, and knowledge graph completion has become an urgent problem to be solved. The existing knowledge graph completion method models ignore the important information rich in neighbor nodes and relationships, and often simply splice neighbor nodes and relationships together, ignoring the different importance of different relationships and neighbor nodes to nodes. To solve this problem, this paper proposed a knowledge graph completion method (ICGAT) based on the interactive connection graph attention network. The method firstly finds out the potential relationship by finding two-hop neighbor nodes, and expands the triples of each node. Then it fuses the relationship in each triplet with the features of the node, and adopts the method of interactive connection between nodes and neighbor nodes, using 4 space vectors to represent the interactively connected relationship. Finally, the vector of interactive connection was input into the graph attention network to obtain the weight of relations and neighbor nodes to the node, so as to illustrate its importance. In order to effectively represent triples of complex relationships such as one-to-many, many-to-many, etc., this method used the RotatE model as a pre-training model. The experimental results in the link prediction task show that the performance of the mean rank and HR@10indicators of the ICGAT method in the WN18RR and FB15k-237 datasets have been improved to a certain extent, indicating that ICGAT can improve the accuracy of the link prediction task.

    DU Qiliang, XIANG Zhaoyi, TIAN Lianfang, et al
    2022, 50(12):  20-29.  doi:10.12141/j.issn.1000-565X.220055
    Asbtract ( 1299 )   HTML ( 14)   PDF (2386KB) ( 121 )  
    Figures and Tables | References | Related Articles | Metrics

    Human action recognition has received much attention in the field of computer vision because of its important role in public safety. However, when fusing the neighborhood features of multi-scale nodes, existing graph convolutional networks usually adopt a direct summation method, in which the same importance is attached to each feature, so it is difficult to focus on important features and is not conducive to the establishment of optimal nodal relationships. In addition, the two-stream fusion method, which averages the prediction results of different models, ignores the potential data distribution differences and the fusion effect is not good. To this end, this paper proposed a two-stream adaptive attention graph convolutional network for human action recognition. Firstly, a multi-order adjacency matrix that adaptively balances the weights was designed to focus the model on more important domains. Secondly, a multi-scale spatio-temporal self-attention module and a channel attention module were designed to enhance the feature extraction capability of the model. Finally, a two-stream fusion network was proposed to improve the fusion effect by using the data distribution of the two-stream prediction results to determine the fusion coefficients. On the two subdatasets of cross subject and cross view of NTU RGB+D, the recognition accuracy of the algorithm is 92.3% and 97.5%, respectively; while on the Kinetics-Skeleton dataset, it reaches 39.8%, both of which are higher than the existing algorithms, indicating the superiority of the algorithm in human motion recognition.

    YU Lubin, TIAN Lianfang, DU Qiliang
    2022, 50(12):  30-40.  doi:10.12141/j.issn.1000-565X.210541
    Asbtract ( 1863 )   HTML ( 21)   PDF (4180KB) ( 69 )  
    Figures and Tables | References | Related Articles | Metrics

    Object tracking is of great significance in computer vision tasks. Recently, with the development of deep learning, the tracking algorithms based on Siamese networks have been extensively applied because of their excellent capabilities. However, the performance of the existing Siamese network modules degrades significantly when dealing with special situations such as large deformation of the target, low resolution, and complex background. To address these aforementioned issues, this paper proposed a tracking algorithm based on a multi-stream attention Siamese network. This algorithm first constructs super-resolution modules and data enhancement mo-dules, which performs super-resolution and data augmentation on the target templates, respectively, so as to improve the feature characterization ability of the target template. Then, the three backbone networks were used to extract the features of the original target template, the super-resolution target template, and the data augmentation target template, respectively, and their features were fused; simultaneously, the channel attention module and spatial attention module are applied in the backbone network to improve the feature extraction capability. Finally, the fused feature map and the feature map to be searched were input into the region proposal network module to obtain the target tracking information. The experimental results show that the algorithm achieved the precision of 0.919, the success of 0.707 on the OTB100 dataset and the accuracy of 0.642, the robustness of 0.149 on the VOT2018 dataset, with operation speed higher than 20 times per second in real scenarios, demonstrating the excellent tracking performance of the algorithm and excellent robustness in handling various complex scenarios.

    YI Qingming, LÜ Renyi, SHI Min, et al
    2022, 50(12):  41-48.  doi:10.12141/j.issn.1000-565X.220095
    Asbtract ( 3075 )   HTML ( 10)   PDF (2447KB) ( 209 )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the tough issues of slow detection and heavy parameters, the deep neural networks are inapplicable to be deployed on mobile application scenarios which are computing-resource-constrained but demand high speed calculation. To improve the inference speed for object detection and achieve a better tradeoff between detection accuracy and inference speed, this paper proposed a lightweight object detection network named MDDNet which combined multi-scale dilated-convolution and multi-scale deconvolution. Firstly, a lightweight detection backbone network was designed based on an efficient single-stage strategy, and the depthwise separable convolution was introduced to reduce the parameter amount of the baseline and further speed up the feature extraction. Secondly, two feature extension branches based on multi-scale dilated convolution were added to the backbone network, which were respectively connected to the ends of the final and the penultimate residual layers of the basic network. The features of the two branches were fused in the prediction layer to augment the texture features of the shallow feature maps. Thirdly, the multi-scale deconvolution module was further introduced and connected to the deep feature network layers to increase the size of the feature map, and then the shallow feature maps of the previous layer with different scales were fused so as to enrich the feature semantic information and the detailed information, improving the detection accuracy. Finally, the parameters of the prior bounding box were optimized in the prediction layer based on the K-means clustering method, so that the prior bounding box could better match the ground truth of the object, achieving higher object recognition accuracy. The experimental results show that the MDDNet produces about 7.21×106 parameters. The average accuracy is 58.7% and 76.0% in KITTI and Pascal VOC datasets, respectively, while the corresponding inference speed respectively reaches 55 f/s and 52 f/s in the above two datasets. Therefore, MDDNet achieves a decent tradeoff among the parameter amount, detection speed, and detection accuracy, and it can be applied to real-time object detection on mobile terminals.

    YU Ying, HE Penghao, XU Chaoyue
    2022, 50(12):  49-59.  doi:10.12141/j.issn.1000-565X.220025
    Asbtract ( 999 )   HTML ( 7)   PDF (7978KB) ( 57 )  
    Figures and Tables | References | Related Articles | Metrics

    Image inpainting is of great significance and value in computer vision tasks. In recent years, image inpainting models based on deep learning have been widely used in this field. However, the existing deep learning image inpainting models have the problems of insufficient utilization of the effective information in the damaged image and interference by the mask information in the damaged image, which leads to the loss of part of the structure and fuzzy part of the details of the repaired image. Therefore, this paper proposed an image inpainting model based on a residual attention fusion and gated information distillation. Firstly, the model consists of two parts, the generator and the discriminator. The backbone structure of the generator uses the U-Net network and consists of two parts, the encoder and the decoder. The discriminator uses a Markov discriminator and consists of six convolutional layers. Then, the residual attention fusion block was used in the encoder and decoder, respectively, to enhance the utilization of valid information in the broken image and reduce the interference of mask information. Finally, a gated information distillation block was embedded in the skip connection of the encoder and decoder to further extract the low-level features in the damaged image. The experimental results on public face and street view datasets show that, the proposed model has better repair performance in semantic structure and texture details; the proposed model outperforms the five contrast models in structural similarity, peak signal to noise ratio, mean absolute error, mean square error and Fréchet distance indicators, demonstrating that the inpainting quality of the proposed model is superior to the compared models.

    LIU Xiaolan, SHI Zongyu, YE Zehui, et al
    2022, 50(12):  60-70.  doi:10.12141/j.issn.1000-565X.220069
    Asbtract ( 1029 )   HTML ( 14)   PDF (2961KB) ( 59 )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional multi-view clustering task is for complete data. However, in practical tasks, due to the limitation of the information acquisition method, some views tend to contain missing data, and this leads to the problem of incomplete multi-view clustering. In view of this problem, most of the existing clustering models are based on non-negative matrix factorization or distance graph, and their co-optimization strategy can easily make the performance of the solution insecure and the global structure can’t be fully characterized. In order to improve the performance of clustering graph, this paper proposed an incomplete multi-view clustering algorithm ALIMSC based on low-rank subspace clustering and anchor graph. The algorithm first obtained the benchmark similarity matrix of data by incomplete multi-view subspace clustering algorithm APMC based on anchor graph, which was embedded in the low-rank subspace clustering model. The similarity matrix was obtained by dimensionality ascending alignment and weighted fusion, and the final clustering graph was obtained by making the similarity matrix as consistent as possible with the benchmark similarity matrix. ALIMSC algorithm characterized the low-dimensional subspace distribution of high-dimensional data by imposing rank minimization constraint on the similarity matrix of each view and emphasized the subspace structure of the data on the basis of the original anchor graph, that is, the block diagonality reflected in the cluster graph. Experimental results on several public datasets show that the proposed algorithm outperforms the classical incomplete multi-view algorithms.

    CAI Xiaodong, ZENG Zhiyang
    2022, 50(12):  71-79.  doi:10.12141/j.issn.1000-565X.220180
    Asbtract ( 967 )   HTML ( 7)   PDF (1245KB) ( 65 )  
    Figures and Tables | References | Related Articles | Metrics

    Previous session-based recommendation systems usually capture users’ consumption preferences from their recent transaction records, and this method ignores the influence of global transaction information and friends’ preferences on users’ transaction behavior, resulting in less accurate recommendation results of the model. To solve the problem, this paper proposed a social recommendation model AFGSRec based on an adaptive fusion of global collaborative features. Firstly, a heterogeneous graph neural network was used to model users and their historical transaction information on the social network for capturing global collaborative features and social influence among friends. Secondly, this paper designed a graph neural network based on a selection mechanism that effectively filters out the node transition features irrelevant to the current session and captures user preferences more accurately. Thirdly, an adaptive fusion method was designed to capture the impact of global collaborative features on users’ current preferences dynamically and improve the model’s recommendation accuracy. Finally, this paper used a dynamic cyclical learning rate to help the model better handle saddle points during the training process to improve the convergence speed of model AFGSRec. The experimental results show that AFGSRec is robust; both the HR (Hit Rate) and MRR (Mean Reciprocal Rank) of AFGSRec outperform the state-of-art model SERec. On the Gowalla dataset, HR@10 and HR@20 are increased by 1.91% and 1.15%, respectively; MRR@10 and MRR@20 are increased by 5.05% and 4.83%, respectively. On the Delicious dataset, HR@10 and HR@20 are increased by 2.45% and 1.19%, respectively; MRR@10 and MRR@20 are increased by 4.84% and 4.32%, respectively.

    WANG Jie, XIA Xiaoming
    2022, 50(12):  80-88.  doi:10.12141/j.issn.1000-565X.220013
    Asbtract ( 2896 )   HTML ( 11)   PDF (1576KB) ( 98 )  
    Figures and Tables | References | Related Articles | Metrics

    Named entity recognition is a fundamental task of natural language processing (NLP) and plays an important role in many downstream NLP tasks, including information extraction and machine translation, etc. The existing named entity recognition methods are usually based on sequence labeling and extract entities within a sentence independently. These methods ignore the semantic information between sentences. Named entity recognition methods based on machine reading comprehension encode important prior information about the entity class. It is easier to distinguish similar classification labels, which reduces the difficulty of model learning, but it still only models at the sentence level, ignoring the semantic information between sentences, which is easy to cause the problem of inconsistent entity labeling in different sentences. To this end, this paper extended the sentence-level named entity recognition to the text-level named entity recognition, and then proposed a BiLSTM-BiDAF named entity recognition model based on machine reading comprehension. First, to utilize the context information within the whole text, NEZHA pre-training language model was used to obtain information of the full text and local features were further captured through BiLSTM, so as to strengthen the model’s ability to capture locally dependent information. Then, a bidirectional attention flow was introduce to learn the semantic association between the text and entity category. Finally, to predict the position of entities in the text, a boundary detector based on the gating mechanism was design to strengthen the correlation of the entity boundary. At the same time, an answer count detector was establish to identify the unanswerable questions. Experimental results on the CCKS2020 Chinese electronic medical records dataset and CMeEE dataset show that our model can effectively identify document-level and sentence-level named entities, and F1 can reach 84.76% and 57.35%, respectively.

    Electronics, Communication & Automation Technology
    CHEN Fangjiong, LIU Mingxing, FU Zhenhua, et al
    2022, 50(12):  89-100.  doi:10.12141/j.issn.1000-565X.220040
    Asbtract ( 1037 )   HTML ( 2)   PDF (1980KB) ( 84 )  
    Figures and Tables | References | Related Articles | Metrics

    In order to cope with the complex underwater acoustic channel environment and improve the convergence speed and symbol error rate performance of the channel equalization algorithm, this paper proposed a zero attraction sparse control proportional minimum symbol error rate decision feedback equalization algorithm. On the basis of the proposed sparse control proportional minimum symbol error rate decision feedback equalization algorithm, this algorithm added a sparse constraint of approximate l0 norm to the objective function, which pulls small amplitude equalizer taps toward zero. At the same time, phase-locked loop technology was introduced in the channel equalization process to eliminate the influence of jitter phase noise. The traditional phase-locked loop technology is based on the minimum mean square error criterion. However, existing literature and related experimental simulations have demonstrated that, when the mean square error of the system is the smallest, the symbol error rate is not necessarily the smallest. Aiming at this problem, a phase-locked loop phase tracking algorithm based on the minimum symbol error rate criterion was proposed and embedded in the sparse equalization algorithm. On the Matlab platform, experiments were carried out on the static underwater acoustic channel and the real time-varying underwater acoustic channel, respectively. The results show that the sparse control proportional minimum bit error rate decision feedback equalization algorithm with approximate l0 norm constraint converges faster without the influence of time-varying phase noise; under the channel condition affected by time-varying phase noise, the phase tracking algorithm based on the minimum bit error rate criterion has faster convergence speed and better bit error rate performance than the fragrance tracking algorithm based on the minimum mean square error criterion.

    ZHENG Juanyi, MU Jinyu, XING Lirong, et al
    2022, 50(12):  101-108.  doi:10.12141/j.issn.1000-565X.220017
    Asbtract ( 1998 )   HTML ( 25)   PDF (1825KB) ( 533 )  
    Figures and Tables | References | Related Articles | Metrics

    In the millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) system with lens antenna array, because the radio frequency (RF) link is much less than the number of antennas, it is necessary to recover the high-dimensional channel from the low-dimensional effective measurement signal by channel estimation. The current channel estimation methods basically make use of the sparsity of the beamspace channel, transforming the channel estimation into compressed sensing problem and then estimating with different methods. Aiming at the limitation that approximate message passing (AMP) algorithm needs channel prior information in channel estimation, this paper proposed an improved channel estimation algorithm. Firstly, a new noise term was derived based on the AMP algorithm and fitted with a convolutional neural network (CNN). Then the iterative denoising process was expanded into a deep network to solve the linear inverse transformation of the measurement signal to the cha-nnel. Finally, the initially estimated channel was further optimized by a residual noise removal network. In addition, the controllable parameters were introduced to increase the flexibility of the channel estimation process, and the sen-sing matrix was jointly trained with other network parameters to improve the channel estimation accuracy. This paper verified the proposed algorithm from two aspects of channel estimation accuracy and system transmission quality, and carried out the theoretical formula derivation and system simulation analysis on the Saleh-Valenzuela channel model. Simulation results show that the proposed algorithm has less model parameters and computation than the traditional algorithm, and can improve the accuracy of channel estimation and the transmission quality of the communication system.

    CHEN Peng, CHEN Yang, WANG Wei
    2022, 50(12):  109-123.  doi:10.12141/j.issn.1000-565X.220277
    Asbtract ( 1282 )   HTML ( 7)   PDF (2928KB) ( 325 )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, the rapid growth of “low, small and slow” UAVs (Unmanned Aerial Vehicles) and their uncontrolled flying have posed a serious threat to urban security and public safety. How to effectively locate “low, small and slow” UAVs in complex low-altitude situations has become an urgent social problem. Due to the blind spot and weak radiation intensity of radar and photoelectric detection, the effect of close range positioning is not ideal. Acoustic positioning, effectively compensates for the disadvantages of radar and photoelectric positioning methods due to its low sensor cost, flexible array placement and small positioning error. This paper summarized the acoustic positioning methods for UAVs. The spectrum of UAV noise was analyzed and it found that these noise signals have a strong line spectrum structure. These line spectra are rich in harmonic frequency components, have a high signal-to-noise ratio and are highly resistant to interference. Firstly, the acoustic characteristics of rotor noise were integrated to derive the capability of locating the UAV from both time and frequency domain perspectives. Secondly the principles of acoustic localization methods were introduced and the algorithm simulation results were given. It also compared the root mean square error of the time domain and frequency domain localization methods. Next, the implementation methods of acoustic-based localization of low-altitude UAVs in universities at home and abroad in recent years were counted, and it found that the planar array based on time difference of arrival (TDOA) is most widely used and has better localization effect. Finally, acoustic localization of low-altitude UAVs in the future was prospected.

    MENG Fanyi, LIU Zhiheng, WANG Yu, et al
    2022, 50(12):  124-131.  doi:10.12141/j.issn.1000-565X.220143
    Asbtract ( 1211 )   HTML ( 6)   PDF (3309KB) ( 117 )  
    Figures and Tables | References | Related Articles | Metrics

    The receiving and transmitting modes of symmetrical bidirectional amplifiers adopt the same amplifier core, which can reduce the complexity of matching network structure and reduce the chip area. In order to further reduce the area of the symmetrical bidirectional amplifier chip, this paper proposed a bidirectional matching technology which integrates the parasitic parameters of transistors under different working states, and explored the relationship between the node impedance variation of silicon-based transistors and the impedance of matching circuits under different bias states. Based on the Leibniz Institute for High Performance Microelectronics (IHP) 0.13 μm SiGe BiCMOS process, a 207~215 GHz high-gain, non-switching symmetrical bidirectional amplifier was designed. By switching the circuit bias, the amplifier realizes the purpose of eliminating the single-pole double-throw switch in the communication system. In this paper, the mirror symmetry of the chip layout was optimized to ensure the consistency of the forward and reverse energy of the amplifier. Full-wave electromagnetic simulation and circuit simulation results show that, in the working frequency band, the maximum gain of each channel of the bidirectional amplifier is 28.6 dB; the minimum noise figure is 16 dB; the minimum values of input and output reflection coefficients S11 and S22 of the bidirectional matching network are -13.6 dB, -15.5 dB respectively; the power consumption of the chip is 63 mW, and the core area is only 0.17 mm2. It shows that the bidirectional matching network can achieve excellent input, output and noise matching effect while saving chip area. The switchless silicon bidirectional amplifier designed in this paper can achieve the operating frequency of more than 200 GHz, and has the characteristics of high gain and compact area. The bidirectional amplifier greatly reduces the chip area and the cost of RF front-end, and can be applied to terahertz microsystems.

    YANG Jinsheng, CHEN Hongpeng, GUAN Xin, et al
    2022, 50(12):  132-141.  doi:10.12141/j.issn.1000-565X.220042
    Asbtract ( 953 )   HTML ( 4)   PDF (2187KB) ( 100 )  
    Figures and Tables | References | Related Articles | Metrics

    Manual segmentation of brain tumor areas in magnetic resonance imaging (MRI) images is time-consuming and laborious, and it can be easily influenced by individual subjectivity. To reliably and efficiently segment brain tumors semi-automatically or automatically is particularly important for medically assisted diagnosis. In recent years, convolutional neural network-based methods for automatic segmentation of brain tumor images have made great progress, but the existing methods still cannot effectively fuse features in terms of large-scale contours and small-scale texture details of tumor images, and the rich global background information is ignored during trai-ning. In view of these problems, this paper proposed a multi-scale lightweight brain tumor image segmentation network MSL-Net. First, the base convolution in the U-Net network was replaced with an improved hierarchical decoupled convolution to expand the perceptual field while efficiently exploring multi-scale multi-view spatial information. Then, a bidirectional feature pyramid network structure was introduced at the skipping connection to fuse multi-scale features, and a hybrid loss function combining the generalized Dice loss function and the Focal loss function was used to improve segmentation accuracy and accelerate convergence in the case of pixel count imba-lance between tumor and non-tumor regions. Experimental results on the BraTS 2019 dataset show that the Dice similarity coefficients of the proposed MSL-Net network in the overall tumor region, core tumor region and enhanced tumor region are 0.900 3, 0.830 6 and 0.777 0, respectively, and the number of parameters and computation (floating-point operations per second) are 3.9×105 and 3.16×1010, respectively. Compared with the current state-of-the-art methods, the method proposed in the paper achieves high segmentation accuracy while achieving light weight.

    YAN Hongli, HUANG Linnan, GAO Yueming, et al
    2022, 50(12):  142-150.  doi:10.12141/j.issn.1000-565X.210752
    Asbtract ( 1082 )   HTML ( 5)   PDF (2847KB) ( 108 )  
    Figures and Tables | References | Related Articles | Metrics

    Surface electrical impedance myography (sEIM) is important for evaluating muscle imbalance and muscle diseases. The mixed signals of impedance in the subcutaneous multilayer tissue captured by surface electrodes contain diverse-redundant components. Taking the mixed signals captured by sEIM as the blind signals and the muscle layer impedance as the source signal, this paper proposed a method for separating muscle layer impe-dance based on the impedance equivalent analysis and blind source separation, in order to improve the sensitivity of sEIM detecting changes in target muscle state. Firstly, a limb multilayer cylindrical finite element model for simulation was constructed. Secondly, a sensitivity method was employed to calculate the impedance contribution of each tissue layer for excluding redundant weak signals, which was equated to a blind source separation problem targeting the muscle layer. Finally, the separation effects of independent component analysis, principal component analysis, and equivariant adaptive separation via independence (EASI) were compared by numerical simulation and in vivo experiments to obtain the optimal solution and verify its feasibility. The results show that the correlation coefficient is above 0.98, the noise immunity is approximately 0.8, and the error cross talking (ECT) converges to 0.876 using the EASI-based method for separating muscle layer impedance. The separated muscle layer impedance in the in vivo experiments is consistent with the law of human impedance characteristics, indicating the method for separating muscle layer impedance with EASI can better enhance the sensitivity of sEIM to detect changes in the target muscle state.

News
 
Featured Article
Most Read
Most Download
Most Cited