2024 Computer Science and Technology
Aiming at the problem that deep learning-based rolling bearing fault diagnosis algorithms need to learn from a large amount of labeled data and face poor diagnosis effect when the number of samples is limited, this paper proposed a small-sample rolling bearing fault diagnosis method based on the Gramian angular difference field (GADF) and generative adversarial networks (GAN). Firstly, a data enhancement method based on GADF transform was proposed, and it converts a few 1D vibration signals into 2D GADF images by GADF transform. GADF subgraphs are obtained by cropping to obtain a large number of image samples. Then, a conditional generative adversarial network (CGAN) was combined with Wasserstein GAN with gradient penalty (WGAN-GP) to construct a novel generative adversarial network, which enhances the model training stability by conditional auxiliary information with gradient penalty and designs dynamic coordinate attention mechanism to enhance the spatial perception of the model, so as to generate high-quality samples. Finally, the generative samples were used to train the classifier, and the diagnosis results were obtained on the validation set. Two sets of bearing fault diagnosis experiments in a small sample environment were conducted using the Southeast University dataset and the Case Western Reserve University dataset, respectively. The results show that, compared with traditional generative adversarial networks as well as advanced small-sample fault diagnosis methods, the proposed method can obtain the best results in five fault diagnosis metrics, including accuracy and precision, and can accurately diagnose the type of bearing faults under small-sample conditions.
Actual job-shop scheduling problems often exhibit high complexity, and the scheduling algorithm needs to consider more constraints, so it increases the difficulty of solving the problem. To address the challenge of out-of-order processing for different batches and processes in the flexible job shop’s batch scheduling scenario, it is necessary to overcome the issues related to low utilization rates of existing job shop machines and unbalanced workload distribution among machines of the same type. Therefore, this paper constructed a flexible job shop equal batch scheduling model that incorporates partially out-of-order execution of processes. Firstly, based on the widely adopted fast non-dominated sorting genetic algorithm (NSGA-Ⅱ), this paper introduced a novel two-stage coding structure that integrates batch information and process sorting information. The priority rule method was used to obtain the initial population, and with minimizing the completion time, machine load equilibrium rate and total machine load as the optimization goal, the greedy algorithm was used to obtain the optimal value of the model, and then the processing path of different batches was dynamically constructed.Then, the optimized objective functions were sorted, and the non-dominant sorting process was added step by step to solve the problem that multiple optimized objective functions are difficult to optimize at the same time and improve the solving efficiency. Finally, taking the wood products processing workshop of a printing and packaging enterprise as an example, the scheduling process was realized according to the field operation information. The results show that, compared with the priority scheduling rules, the completion time of the proposed method is shortened by 6.6%, the average machine load balancing variance is reduced by 10.7%, and the average of the proposed method is reduced by 53.3% compared with the genetic algorithm, thus verifying the feasibility of the method. This method can meet the high performance scheduling requirements of the flexible workshop of printing and packaging enterprises.
Zero-shot image semantic segmentation is one of the important tasks in the visual field of zero-shot learning, aiming to segment novel categories unseen during training. The current distribution of visual features based on pixel-level visual feature generation is inconsistent with real visual feature distribution. The synthesized visual features inadequately reflect class semantic information, leading to low discriminability in these features. Some existing generative methods consume significant computational resources to obtain the discriminative information conveyed by semantic features. In view of the above problems, this paper proposed a zero-shot image semantic segmentation network called SVCCNet, which is based on semantic-visual consistency constraints. SVCCNet uses a semantic-visual consistency constraint module to facilitate the mutual transformation between semantic features and visual features, enhancing their correlation and diminishing the disparity between the spatial structures of real and synthesized visual features, which mitigates the inconsistency problem between the distributions of synthesized and real visual features. The semantic-visual consistency constraint module achieves the correspondence between visual features and class semantics through two mutually constrained reconstruction mappings, while maintaining low model complexity. Experimental results on the PASCAL-VOC and PASCAL-Context datasets demonstrate that SVCCNet outperforms mainstream methods in terms of pixel accuracy, mean accuracy, mean intersection over union (IoU), and harmonic IoU.
With the development of autonomous driving technology, deep reinforcement learning has become an important means to realize the efficient driving policy learning. However, the implementation of autonomous driving is faced with the challenges brought by the complex and changeable traffic scenes, and the existing deep reinforcement learning methods have the problems of single scene adaptation ability and slow convergence speed. To address these issues and to improve the scene adaptability and policy learning efficiency of autonomous vehicles, this paper proposed a multi-task assisted driving policy learning method. Firstly, this method constructed the encoder-multi-task decoder module based on the deep residual network, squeezing high-dimensional driving scenes into low-dimensional representations, and adopted multi-task-assisted learning of semantic segmentation, depth estimation and speed prediction to improve the scene information richness of low-dimensional representations. Then, the low-dimensional representation was used as the state input to build a decision network based on reinforcement learning, and the multi-constraint reward function was designed to guide the learning of driving strategies. Finally, simulation experiments were conducted in CARLA. The experimental results show that, compared to classic methods such as DDPG and TD3, the proposed method improves the training process through multi-task assistance and learns better driving policies. It achieves higher task success rates and driving scores in several typical urban driving scenarios such as roundabouts and intersections, demonstrating excellent decision-making capabilities and scene adaptability.
With the development of science and technology, the accuracy of 3D point cloud acquisition equipment has been continuously improved, and the acquisition of massive 3D point cloud data has become a reality. However, the irregular distribution and huge number of data points of 3D point cloud bring great challenges to data storage and transmission. Therefore, 3D point cloud coding is imperative. From the perspective of data sampling, this paper transforms the 3D point cloud coding problem into a 3D point cloud sampling-reconstruction problem, and proposes a sampling-based 3D point cloud geometry coding framework. In this framework, firstly, the down-sampling method is used to sample the original 3D point cloud to the sparse 3D point cloud with a specified number of points. Then, the sparse 3D point cloud is encoded using any existing coding methods (the number of encoding points is significantly reduced, which can effectively reduce the encoding rate). Finally, by using the proposed upsampling method, the decoded sparse 3D point cloud is interpolated as a high-quality dense 3D point cloud similar to the shape of the original input point cloud. Experimental results show that, as compared with the latest G-PCC provided by MPEG, the proposed 3D point cloud geometry coding framework improves the objective quality of the reconstructed 3D point cloud by 5.49 dB on average, and presents better subjective visual effect.
Due to the hand self-occlusion and the lack of depth information, the estimation of 3D hand pose based on monocular RGB images is not accurate enough in estimating relative depth of joints, and the generated hand pose violates the biomechanical constraints of the hand. To solve this problem, by combining the prior knowledge contained in the hand structure and the hand grid information, a deep neural network based on prior knowledge and mesh supervision is proposed. The articulated structure of the hand skeleton implies that there exists a specific relationship between the projections of the 3D hand pose in the 2D image plane and the depth direction, but the differences in hand structure between individuals make it difficult to describe this relationship intuitively and formally. Therefore, this paper proposes to fit it through learning. Specific relationships also exist between joint positions and bone lengths of the same finger, bending directions of different segments of the same finger, and bending directions of different fingers, which are designed as loss functions to supervise network training. The proposed neural network generates hand meshes in parallel with hand poses, supervises the network training through mesh annotation, and optimizes the pose estimation without increasing the network complexity. Furthermore, the neural network is trained using a mixed dataset to further improve its generalization capability. Experimental results show that the proposed method outperforms other methods in terms of internal cross-validation accuracy in multiple datasets, cross-dataset validation accuracy, and time and space complexity of the model. As a result, the prior knowledge of hand skeleton and the mesh supervision improve the accuracy of pose estimation while keeping the neural network compact.
The named entity recognition of traditional Chinese medicine (TCM) classics is the basis for constructing TCM knowledge graph, and is of great significance for the extraction and intelligent presentation of TCM knowledge. However, the knowledge system of TCM has a huge structure, and the publicly available corpus is scarce and semantically complex. Most of the current researches focus on the expression of character vectors, and do not fully consider the rich semantic features in the structural characteristics of special Chinese characters. Moreover, due to the rich semantic meaning of Chinese characters, there are still problems of insufficient expression of the potential features and polysemy of one word. In this paper, a named entity recognition method based on SiKuBERT and multivariate data embedding is proposed by combining the corpus features of ancient Chinese medicine books with the structural information of ancient Chinese characters. In this method, the word feature information is created by SiKuBERT, and on this basis, word features and radical features are embedded to capture the semantic information of Chinese characters, so that characters with similar radical sequences can be close to each other in the spatial vector. Then, the method is used to identify the names of people, herbal medicines, diseases, pathologies, and meridians in the Materia Medica dataset. The experimental results show that the proposed method is able to effectively extract five types of entities in the text, with an F1 score of 86.66%, a precision rate of 86.95%, and a recall rate of 86.37%. As compared with the SiKuBERT-CRF model based on word features, the proposed method integrates the word information with the structural information of traditional Chinese characters, which enhances the entity recognition effect, and the overall F1 score is improved by 2.83 percentage points. Moreover, the proposed method is most effective in the recognition of Chinese herbal medicine names and disease names with significant radicals, with the corresponding F1 scores respectively being improved by 3.48 and 0.97 percentage points, as compared with the SiKuBERT-CRF model based on word features. In general, the performance index of the proposed method is higher than other mainstream deep learning models and possesses good generalization ability.
The skyline query in road networks has important application value in the fields such as intelligent transportation, point of interest discovery, and location services. In order to solve the problem of low efficiency of skyline queries in road network environment and the lack of privacy of query results, a differential privacy-based skyline query method in road network environment is proposed. In this method, first, aiming at the characteristics of large data amount and complex data in the initial dataset of road network environment, the dataset is preprocessed, and three pruning rules are proposed based on the properties of the skyline layer divided by distance attributes and the Voronoi diagram of the road network. Next, based on the pruning rules, a dataset pruning algorithm in road network environment is proposed, which can effectively filter out a large amount of redundant data. Then, for the filtered dataset, a storage method of grid index is utilized to save the storage space. Furthermore, a skyline extension tree based on grid index is designed, and an algorithm for querying global candidate skyline point sets is proposed based on the extension tree and the corresponding pruning rules. Finally, for the query result set, a differential privacy budget allocation model is employed to allocate privacy budgets, and a result set publishing algorithm based on information divergence is proposed, thus effectively improving the privacy of data information. Experimental results show that the proposed query method achieves a query accuracy of more than 99%. It improves the query efficiency by more than 10%, as compared with the traditional skyline query methods in larger datasets. When the total differential privacy budget is 0.01, 0.10, 0.50 and 1.00, the relative error of the proposed privacy budget allocation method is lower than that of the equal difference and equal ratio allocation methods.
Most existing Deepfake face forgery detection algorithms suffer from the insufficient generalization performance despite that their intra-dataset detection performance is fairly good. This is because these methods mainly rely on local features that are prone to overfitting, which leads to unsatisfactory cross-dataset detection performance. In order to solve this problem, a face forgery detection method based on multi-scale spatiotemporal features and tampering probability is proposed, which helps to maintain good performance for cross-dataset testing, cross-forgery testing as well as video compression by detecting the inevitable temporal inconsistency between continuous frames in deepfake videos. The proposed detection method consists of three modules: a multi-scale spatiotemporal feature extraction module is employed to reveal the discontinuous traces of fake videos in the temporal domain, a three-dimension dual-attention module is designed to adaptively compute the correlation between multi-scale spatiotemporal features, and an auxiliary supervision module is used to predict the tampering probabilities of randomly selected pixels to form a supervision mask. Then, the proposed algorithm is compared with the baseline algorithm and the latest relevant works on large-scale public standard databases such as FF++, DFD, DFDC and CDF. Experimental results have show that the proposed algorithm has the best overall performance for cross-dataset testing and video compression, and has the above-average performance for cross-forgery testing. Meanwhile, it maintains good average performance for all intra-dataset testing. All the experiments demonstrate the effectiveness of the proposed algorithm.
The occurrence of fires has brought huge losses to society. The task of forest fire prevention and control is becoming increasingly urgent, and global warming has made this problem more complicated. Deep learning plays an important role in all walks of life. A large number of models are constantly designed and proposed, and there are various ways to improve the models. Therefore, this article proposed the EfficientNet-E model, which uses the more advanced ECA module (Efficient Channel Attention module) to replace the SE module in the EfficientNet. It improves the performance of the model by enhancing the performance of the attention mechanism. Compared with the SE module, the ECA module better retains the information during transmission, allowing the data features to be more fully retained during the transmission process, thus enabling the model to be optimized. To verify the performance of the EfficientNet-E model and the advantages of EfficientNet’s design idea in forest fire identification compared with traditional models, this article selected representatives of classic models, ResNet and DenseNet, as comparison references, and conducted related experiments in combination with EfficientNet and EfficientNet-E.The experiment selected 3 303 forest fire, non-fire and smoke pictures.The results of multiple rounds of tests show that EfficientNet-E is better than the conventional classic deep learning model in identifying forest fire data, and compared with the original EfficientNet’s average accuracy of 89.28%, EfficientNet-E’s average accuracy (90.04%) is obviously improved. The standard deviation is smaller and the training stability is better, which confirms the improved EfficientNet-E’s better performance.
The social recommendation model based on graph neural network has achieved good performance in improving the performance of the recommendation system. However, the existing methods ignored the possible feature mismatch between the queried target users and content nodes and their neighbors, which leads to the introduction of noise and reduces the model performance. To solve this problem, this paper proposed a social recommendation model DNSSR. Firstly, it constructed a relational graph containing multiple relationships between users and items, with richer information associations between nodes in the graph. Then the dynamic neighborhood sampling mechanism was used to obtain neighbor nodes that are more consistent with the characteristics of the target query pair, reducing noise information. In addition, in order to further improve the predictive performance of the model, this paper designed an enhanced graph neural network to model the sampled relationship subgraphs. It can distinguish the importance of different neighboring nodes and select more reliable information sources to obtain more robust user and item embedding vectors for rating prediction. The experimental results show that the prediction error of this model is significantly reduced compared to that of other advanced models, proving the effectiveness of the methods proposed in the paper. Especially for the dynamic neighborhood sampling mechanism, if it is abandoned, the RMSE and MAE indicators of DNSSR on the Ciao dataset will increase by 6.05% and 7.31% respectively, and the Epinions dataset will increase by 3.49% and 5.41% respectively, which fully demonstrate their effectiveness in reducing noise interference and improving the performance of social recommendation models.
With complex two-dimensional structure, offline handwritten mathematical expressions is difficult to recognize due to the variable scale of their symbols and the various transformation of their writing styles. This paper proposed a mutual learning model based on multi-scale feature fusion. Firstly, to enhance the model for extracting fine-grained information from expressions and comprehending semantic information of global two-dimensional structures, multi-scale feature fusion was introduced in the encoding stage. Secondly, paired handwritten and printed mathematical expressions were introduced for training the mutual learning model, which includes decoder loss and context matching loss to learn LaTeX grammar as well as semantic invariance between handwritten and printed mathematical expressions respectively to improve the robustness of the model to different writing styles. Experimental validation was performed on the CROHME 2014/2016/2019 dataset. After introducing the multi-scale feature fusion mechanism, the expression correctness rate reaches 55.25%, 52.31%, 53.72%, respectively. After introducing the mutual learning mechanism, the expression correct rate reaches 55.43%, 53.53%, 53.79%, respectively. The expression correctness rate reaches 58.88%, 55.10%, 57.05% after introducing both mechanisms at the same time. It is proved experimentally that the proposed method can effectively extract the features in formulas at different scales and overcome the problems of different handwriting styles and small amount of data by mutual learning mechanism. In addition, the experimental results on the HME100K dataset verified the effectiveness of the proposed model.
Given the faster speed of low-precision floating point operations, more and more high-performance applications are using hybrid precision solutions to accelerate.The large AI (artificial intelligence) models that use this scheme to accelerate has also received wide attention. Recently, the HPL-AI (High Performance LINPACK for Accelerator Introspection) benchmark has been proposed to evaluate the mixed-precision computing performance of high-performance systems. For this benchmark test, this study designed and optimized the implementation of single-node HPL-AI benchmark test on Kunpeng and Ascend heterogeneous platforms. In order to balance the load of the AI processor, the tasks were evenly distributed to the AI processors through the cyclic task allocation strategy. The task allocation strategy with interval value was used to improve the continuity of data transmission to reduce the data transmission time between CPU and AI processor. Without affecting the calculation accuracy, the computation on the CPU side was reduced by the strategy of canceling the data scaling. The final experimental results show that the HPL-AI benchmark has the fastest mixed-precision floating-point arithmetic speed when the interval value is 8; at the same time, unscaling the data does not affect the accuracy of the HPL-AI benchmark results. Compared with the non-optimized HPL-AI benchmark implementation on the heterogeneous platform of Kunpeng and Ascend, the optimization strategy proposed in this paper improves the mixed-precision floating-point arithmetic speed by about 29%, which lays a solid foundation for the further optimization of single-node HPL-AI benchmark and the deployment of multi-node HPL-AI benchmark.
With the progress of network technology, applications such as vehicle networks, industrial Internet of Things and 5G ultra-reliable low-delay communication (uRLLC) all require TSN to ensure ultra-low delay deterministic data transmission. TSN traffic scheduling requires a fast and accurate scheduling algorithm. The existing accurate solution methods are of high complexity and cannot meet the real-time requirements in large-scale joint scheduling. This paper designed a routing optimization genetic algorithm (Routing-GA) with better performance. Combining routing and traffic scheduling constraints, it can improve the efficiency of scheduling algorithm by optimizing routing and provide services for link load balancing scheduling. This strategy increases the space and flexibility of scheduling, and has the characteristics of fast near-optimal solution of meta-heuristic algorithm. It can deal with large-scale TSN routing constraint joint scheduling problem simply and effectively. Routing-GA takes the minimum end-to-end delay of time-sensitive flow as the optimization objective, considers Routing and TSN constraints jointly, and provides a genetic algorithm coding method with low complexity, high efficiency and high scalability according to the characteristics of TSN transmission problems. In addition, in order to improve the performance of the scheduling algorithm, a crossover mutation mechanism was proposed to optimize the route length and link load balancing. The experimental results show that the realized Routing-GA can effectively reduce the end-to-end delay and significantly improve the solution quality. The evolution rate can reach 24.42%, and the average iteration time of traditional genetic algorithm (GA) is only 12%. It can effectively improve the performance of the algorithm and meet the constraint requirements of TSN scheduling.