2022 Computer Science & Technology
Due to the tough issues of slow detection and heavy parameters, the deep neural networks are inapplicable to be deployed on mobile application scenarios which are computing-resource-constrained but demand high speed calculation. To improve the inference speed for object detection and achieve a better tradeoff between detection accuracy and inference speed, this paper proposed a lightweight object detection network named MDDNet which combined multi-scale dilated-convolution and multi-scale deconvolution. Firstly, a lightweight detection backbone network was designed based on an efficient single-stage strategy, and the depthwise separable convolution was introduced to reduce the parameter amount of the baseline and further speed up the feature extraction. Secondly, two feature extension branches based on multi-scale dilated convolution were added to the backbone network, which were respectively connected to the ends of the final and the penultimate residual layers of the basic network. The features of the two branches were fused in the prediction layer to augment the texture features of the shallow feature maps. Thirdly, the multi-scale deconvolution module was further introduced and connected to the deep feature network layers to increase the size of the feature map, and then the shallow feature maps of the previous layer with different scales were fused so as to enrich the feature semantic information and the detailed information, improving the detection accuracy. Finally, the parameters of the prior bounding box were optimized in the prediction layer based on the K-means clustering method, so that the prior bounding box could better match the ground truth of the object, achieving higher object recognition accuracy. The experimental results show that the MDDNet produces about 7.21×106 parameters. The average accuracy is 58.7% and 76.0% in KITTI and Pascal VOC datasets, respectively, while the corresponding inference speed respectively reaches 55 f/s and 52 f/s in the above two datasets. Therefore, MDDNet achieves a decent tradeoff among the parameter amount, detection speed, and detection accuracy, and it can be applied to real-time object detection on mobile terminals.
Human action recognition has received much attention in the field of computer vision because of its important role in public safety. However, when fusing the neighborhood features of multi-scale nodes, existing graph convolutional networks usually adopt a direct summation method, in which the same importance is attached to each feature, so it is difficult to focus on important features and is not conducive to the establishment of optimal nodal relationships. In addition, the two-stream fusion method, which averages the prediction results of different models, ignores the potential data distribution differences and the fusion effect is not good. To this end, this paper proposed a two-stream adaptive attention graph convolutional network for human action recognition. Firstly, a multi-order adjacency matrix that adaptively balances the weights was designed to focus the model on more important domains. Secondly, a multi-scale spatio-temporal self-attention module and a channel attention module were designed to enhance the feature extraction capability of the model. Finally, a two-stream fusion network was proposed to improve the fusion effect by using the data distribution of the two-stream prediction results to determine the fusion coefficients. On the two subdatasets of cross subject and cross view of NTU RGB+D, the recognition accuracy of the algorithm is 92.3% and 97.5%, respectively; while on the Kinetics-Skeleton dataset, it reaches 39.8%, both of which are higher than the existing algorithms, indicating the superiority of the algorithm in human motion recognition.
Named entity recognition is a fundamental task of natural language processing (NLP) and plays an important role in many downstream NLP tasks, including information extraction and machine translation, etc. The existing named entity recognition methods are usually based on sequence labeling and extract entities within a sentence independently. These methods ignore the semantic information between sentences. Named entity recognition methods based on machine reading comprehension encode important prior information about the entity class. It is easier to distinguish similar classification labels, which reduces the difficulty of model learning, but it still only models at the sentence level, ignoring the semantic information between sentences, which is easy to cause the problem of inconsistent entity labeling in different sentences. To this end, this paper extended the sentence-level named entity recognition to the text-level named entity recognition, and then proposed a BiLSTM-BiDAF named entity recognition model based on machine reading comprehension. First, to utilize the context information within the whole text, NEZHA pre-training language model was used to obtain information of the full text and local features were further captured through BiLSTM, so as to strengthen the model’s ability to capture locally dependent information. Then, a bidirectional attention flow was introduce to learn the semantic association between the text and entity category. Finally, to predict the position of entities in the text, a boundary detector based on the gating mechanism was design to strengthen the correlation of the entity boundary. At the same time, an answer count detector was establish to identify the unanswerable questions. Experimental results on the CCKS2020 Chinese electronic medical records dataset and CMeEE dataset show that our model can effectively identify document-level and sentence-level named entities, and F1 can reach 84.76% and 57.35%, respectively.
Common image sentiment transformation methods are based on the assumption that transferring image color can transfer image sentiment. However, due to the influence of image content, transferring image color cannot completely transfer image sentiment, and it is necessary to obtain a suitable reference image before transferring image color. However, in practical application, there will be difficulties in obtaining reference images that are similar to the target image in sentiments and similar to the source image in content, and the semantic consistency of local objects need to be considered when transferring image color. Therefore, this paper proposed an image sentiment transformation method based on adaptive brightness adjustment. According to the significant correlation between image brightness and image sentiment (also known as Valence value, abbreviated as V value) in psychology, the method adaptively adjusts brightness through deep neural network ISTNet to convert the image to target image sentiment. First, an image and its corresponding true V value were obtained from the existing image emotion dataset. By changing the image brightness, a series of images with different brightness can be obtained. Then, the pseudo V values corresponding to the images with the same content but different brightness were predicted by the pre-trained image V value regression. Finally, ISTNet was trained with these images and pseudo V values to learn the internal relationship between image brightness adjustment and sentimental change. In practical application, without any reference image, directly input the image and the target V value into the neural network ISTNet to obtain the output image of the corresponding sentimental tag. The experimental results show that the performance of this method is better than the existing color based image sentiment transformation methods.
Traditional multi-view clustering task is for complete data. However, in practical tasks, due to the limitation of the information acquisition method, some views tend to contain missing data, and this leads to the problem of incomplete multi-view clustering. In view of this problem, most of the existing clustering models are based on non-negative matrix factorization or distance graph, and their co-optimization strategy can easily make the performance of the solution insecure and the global structure can’t be fully characterized. In order to improve the performance of clustering graph, this paper proposed an incomplete multi-view clustering algorithm ALIMSC based on low-rank subspace clustering and anchor graph. The algorithm first obtained the benchmark similarity matrix of data by incomplete multi-view subspace clustering algorithm APMC based on anchor graph, which was embedded in the low-rank subspace clustering model. The similarity matrix was obtained by dimensionality ascending alignment and weighted fusion, and the final clustering graph was obtained by making the similarity matrix as consistent as possible with the benchmark similarity matrix. ALIMSC algorithm characterized the low-dimensional subspace distribution of high-dimensional data by imposing rank minimization constraint on the similarity matrix of each view and emphasized the subspace structure of the data on the basis of the original anchor graph, that is, the block diagonality reflected in the cluster graph. Experimental results on several public datasets show that the proposed algorithm outperforms the classical incomplete multi-view algorithms.
Previous session-based recommendation systems usually capture users’ consumption preferences from their recent transaction records, and this method ignores the influence of global transaction information and friends’ preferences on users’ transaction behavior, resulting in less accurate recommendation results of the model. To solve the problem, this paper proposed a social recommendation model AFGSRec based on an adaptive fusion of global collaborative features. Firstly, a heterogeneous graph neural network was used to model users and their historical transaction information on the social network for capturing global collaborative features and social influence among friends. Secondly, this paper designed a graph neural network based on a selection mechanism that effectively filters out the node transition features irrelevant to the current session and captures user preferences more accurately. Thirdly, an adaptive fusion method was designed to capture the impact of global collaborative features on users’ current preferences dynamically and improve the model’s recommendation accuracy. Finally, this paper used a dynamic cyclical learning rate to help the model better handle saddle points during the training process to improve the convergence speed of model AFGSRec. The experimental results show that AFGSRec is robust; both the HR (Hit Rate) and MRR (Mean Reciprocal Rank) of AFGSRec outperform the state-of-art model SERec. On the Gowalla dataset, HR@10 and HR@20 are increased by 1.91% and 1.15%, respectively; MRR@10 and MRR@20 are increased by 5.05% and 4.83%, respectively. On the Delicious dataset, HR@10 and HR@20 are increased by 2.45% and 1.19%, respectively; MRR@10 and MRR@20 are increased by 4.84% and 4.32%, respectively.