Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
The terms "first," "second," and the like as used herein may be used herein to describe various concepts that are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first image may be referred to as a second image, and similarly, a second image may be referred to as a first image, without departing from the scope of the present application.
As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of elements includes 3 elements, each of which refers to each of the 3 elements, and any one of the 3 elements refers to any one of the 3 elements, which may be a first one, a second one, or a third one.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
The scheme provided by the embodiment of the application can train the word vector acquisition submodel, the feature vector acquisition submodel and the dialogue state determination submodel based on the machine learning technology of artificial intelligence, and the training process can be carried out by combining with the calling of the state conversion model. And then, determining the conversation state by utilizing the trained word vector acquisition submodel, the feature vector acquisition submodel and the conversation state determination submodel.
The dialog state determination method provided by the embodiment of the application can be used in computer equipment, and the computer equipment comprises a terminal or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and an artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
The method provided by the embodiment of the application can be used in the scene of intelligent conversation.
For example, in an intelligent dialog scenario:
after the computer equipment acquires the dialogue sentences of a plurality of dialogue turns, the dialogue state determination method provided by the embodiment of the application is adopted to determine the dialogue states of the target dialogue turns, and then the reply sentences corresponding to the question sentences of the user are generated according to the dialogue states, and are provided for the user, so that the intelligent dialogue of man-machine interaction is realized.
For another example, in the intelligent translation scenario:
after the computer equipment acquires the dialogue sentences of a plurality of dialogue turns, the dialogue state determination method provided by the embodiment of the application is adopted to determine the dialogue states of the target dialogue turns, and then the translation sentences of the problem sentences sent by the user are determined according to the dialogue states, and the translation sentences are provided for the user, so that the intelligent translation of human-computer interaction is realized.
Fig. 1 is a flowchart of a dialog state determination method provided in an embodiment of the present application, and is applied to a computer device, as shown in fig. 1, the method includes:
101. a computer device obtains dialog statements for a plurality of dialog turns.
The dialogue sentences of each dialogue turn include at least one of question sentences or reply sentences, the question sentences are sentences input by the user, and the reply sentences are sentences replied by the dialogue system aiming at the question sentences. The dialog statements in each dialog turn may include only user statements, question statements and reply statements, or only reply statements.
In the process of interaction between the user and the dialog system, the user inputs a sentence, and the dialog system replies the sentence, so that when the dialog turn is determined, the question sentence input by the user and the subsequent reply sentences can be used as the dialog turn, and each dialog turn comprises a question sentence and a reply sentence; the reply sentence and the question sentence following the reply sentence may also be regarded as one dialogue turn, and the first dialogue turn may only include the question sentence, and each dialogue turn includes one reply sentence and one question sentence from the second dialogue turn.
For example, when a user interacts with a dialog system, the user inputs a question statement, the dialog system replies a reply statement, and the question statement and the reply statement are used as dialog statements of a dialog turn; or the user inputs a first question sentence, the system replies the first reply sentence, and when the user inputs a second question sentence, the first question sentence is used as a dialogue turn, and the first reply sentence and the second question sentence are used as a dialogue turn.
The plurality of conversation turns include a target conversation turn and at least one historical conversation turn located before the target conversation turn, the target conversation turn may be a last conversation turn in the plurality of conversation turns, and then other conversation turns except the target conversation turn in the plurality of conversation turns are the historical conversation turns of the target conversation turn. The multiple dialog turns may be arranged according to a generation order of each dialog turn, for example, the obtained multiple dialog turns are arranged according to the generation order as follows: the session turns 1, 2, 3, and 4 are the target session turns 4, and the session turns 1, 2, and 3 are the historical session turns of the session turns 4.
It should be noted that the computer device is a terminal, the terminal runs a dialog system, and the terminal obtains dialog statements of a plurality of dialog turns through the dialog system; or the computer equipment is a server, the server is a background server of a dialog body system, the terminal interacts through the dialog system to obtain dialog sentences of a plurality of dialog turns, the dialog sentences of the plurality of dialog turns are sent to the server, and the server receives the dialog sentences of the plurality of dialog turns.
In the embodiment of the application, the dialogue sentences in the dialogue turns are generated by interaction between a user and a dialogue system, and the acquired dialogue sentences in the dialogue turns belong to the same user identifier and are obtained by multiple interactions between the user identifier and the dialogue system. By acquiring the conversation sentences of a plurality of conversation turns corresponding to the same user identifier, the conversation state of the user identifier in the target conversation turn can be determined subsequently. The user logs in the terminal through the user identifier and interacts with a dialog system running in the terminal.
102. And the computer equipment acquires the word vector set of each conversation turn according to the conversation sentences of each conversation turn.
The word vector set of the conversation turn comprises word vectors of all words in conversation sentences of the conversation turn, and the word vectors of different words are different.
For any conversation turn, the conversation sentence of the conversation turn comprises at least one of a question sentence and a reply sentence, the question sentence comprises at least one word, and the reply sentence comprises at least one word, so that the conversation sentence of the conversation turn comprises at least one word, and therefore, the word vector of each word in the conversation turn is obtained, and the word vector set of the conversation turn is obtained.
The word vectors of the plurality of words in the word vector set may be arranged in an order of arrangement of the plurality of words in the dialog sentence. For example, a dialog sentence in any dialog turn includes word 1, word 2, word 3, and word 4, a word vector a of word 1, a word vector B of word 2, a word vector C of word 3, and a word vector D of word 4 are obtained, and the word vector set in the dialog turn includes word vector a, word vector B, word vector C, and word vector D.
103. And the computer equipment performs weighted fusion processing on the word vector sets of the multiple conversation turns according to the similarity between the dimension vector of each preset dimension identification and each word vector in the word vector sets of the multiple conversation turns to obtain a first fusion characteristic vector corresponding to each preset dimension identification.
The dialog sentences generated in the dialog scenario may contain a lot of information, which may belong to different dimensions, such as hotel name dimension, home address dimension, professional dimension, and so on. In the embodiment of the application, the dimension to which the target conversation turn belongs and the information on the dimension can be determined through analysis processing, so that the conversation state of the target conversation turn can be determined.
The preset dimension identification is a preset identification used for indicating a corresponding dimension, and can be a hotel name dimension identification, a hotel address dimension identification and the like. The dimension vectors are used for representing corresponding vectors of preset dimension marks, the preset dimension marks of different dimensions are different, and the dimension vectors are also different. The computer device can set a plurality of preset dimension identifications, and then select a target dimension identification corresponding to a target conversation turn from the plurality of preset dimension identifications.
The dimension may be referred to as a slot, and the information on the dimension may be referred to as slot information.
The first fusion feature vector is a feature vector obtained by fusing a word vector set of a plurality of conversation turns and is used for representing the fusion features of the conversation sentences of the plurality of conversation turns.
The similarity between the dimension vector of any preset dimension identifier and any word vector is used for representing the possibility that the word corresponding to the word vector belongs to the preset dimension corresponding to the preset dimension identifier, the greater the similarity is, the greater the possibility that the word belongs to the preset dimension is, the smaller the similarity is, the less the possibility that the word belongs to the preset dimension is.
For each preset dimension identifier, the higher the similarity between the word vector and the preset dimension identifier is, the larger the weight corresponding to the word vector is, and the smaller the similarity between the word vector and the preset dimension identifier is, the smaller the weight corresponding to the word vector is. And performing weighted fusion processing on the word vector sets of the multiple conversation turns according to the similarity between the dimension vector of the preset dimension identification and each word vector in the word vector sets of the multiple conversation turns, so as to obtain a first fusion characteristic vector corresponding to the preset dimension identification.
And repeatedly executing the steps according to the dimension vectors of the plurality of preset dimension identifiers to obtain a first fusion characteristic vector of each preset dimension identifier. Because the dimension vectors of different preset dimension identifiers are different, the similarity between the same word vector and different dimension vectors may also be different, and the obtained first fusion feature vectors corresponding to different preset dimension identifiers may be different through the similarity between the dimension vectors of different preset dimension identifiers and each word vector in the word vector set of multiple conversation turns.
104. And the computer equipment determines at least one target dimension identifier corresponding to the target conversation turn and a target keyword corresponding to each target dimension identifier according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion characteristic vector.
The preset keywords are keywords belonging to corresponding preset dimension identifiers, and may be preset. In the embodiment of the present application, each preset dimension identifier may have one or more preset keywords. For example, the preset keywords corresponding to the hotel name dimension identifier may be: hotel a, hotel B, hotel C, etc.
Since the first fusion feature vector corresponding to each preset dimension identifier is obtained in step 103, at least one target dimension identifier is determined from the plurality of preset dimension identifiers according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion feature vector, and the target keyword corresponding to each target dimension identifier is determined from the plurality of preset keywords corresponding to each target dimension identifier.
105. And the computer equipment determines at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn.
The dialog state is used for representing a dialog topic contained in the dialog statement. And taking the determined at least one target dimension identifier and the target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn, indicating that the conversation sentence in the target conversation turn is related to the target dimension, and information contained in the conversation sentence contains the target keyword, so that the intention of the target conversation turn can be determined according to the conversation state of the target conversation turn subsequently, and further a reply sentence is generated for the user.
The method includes the steps of obtaining dialogue sentences of a plurality of dialogue turns, obtaining a word vector set of each dialogue turn, conducting weighted fusion processing on the word vector sets of the dialogue turns according to the dimension vector of each preset dimension identification to obtain a first fusion characteristic vector corresponding to each preset dimension identification, determining at least one target dimension identification corresponding to a target dialogue turn and a target keyword corresponding to each target dimension identification, and determining at least one target dimension identification and the target keyword corresponding to each target dimension identification as dialogue states of the target dialogue turns. Through the fusion of the dialogue sentences of the target dialogue turns and the historical dialogue turns, the information quantity of the dialogue sentences is enriched, the weighted fusion processing is carried out on the word vector sets of the dialogue turns according to each preset dimension identification, the accuracy of the first fusion characteristic vector of each preset dimension identification is improved, and therefore the accuracy of the acquired dialogue state is improved.
In a possible implementation manner, performing weighted fusion processing on the word vector sets of multiple dialog turns according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of multiple dialog turns to obtain a first fusion feature vector corresponding to each preset dimension identifier includes:
for each preset dimension identification, according to the similarity between the dimension vector of the preset dimension identification and a plurality of word vectors in the word vector set of each conversation turn, respectively carrying out weighted fusion processing on the plurality of word vectors in the word vector set of each conversation turn to obtain a first feature vector of each conversation turn;
according to the similarity between the dimension vector of the preset dimension identification and the first feature vectors of the multiple conversation turns, carrying out weighted fusion processing on the first feature vectors of the multiple conversation turns to obtain a second fusion feature vector corresponding to the preset dimension identification;
and carrying out fusion processing on the second fusion feature vector and the first feature vector of the target conversation turn to obtain a first fusion feature vector corresponding to the preset dimension identification.
In another possible implementation manner, for each preset dimension identifier, according to similarity between a dimension vector of the preset dimension identifier and a plurality of word vectors in a word vector set of each dialog turn, performing weighted fusion processing on the plurality of word vectors in the word vector set of each dialog turn respectively to obtain a first feature vector of each dialog turn, including:
for each preset dimension identification and each conversation turn, respectively determining a first similarity between a dimension vector of the preset dimension identification and each word vector in a word vector set of the conversation turn;
determining a first weight of each word vector according to a first similarity corresponding to each word vector in a word vector set of the conversation turn, wherein the first weight of each word vector is in positive correlation with the corresponding first similarity;
and carrying out weighted fusion processing on the plurality of word vectors according to the first weights of the plurality of word vectors in the word vector set to obtain a first feature vector of the conversation turn.
In another possible implementation manner, performing weighted fusion processing on the first feature vectors of the multiple dialog turns according to the similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier, including:
respectively determining second similarity between the dimension vector of the preset dimension identification and the first feature vector of each conversation turn;
determining a second weight of each conversation turn according to the second similarity corresponding to each conversation turn, wherein the second weight of each conversation turn is in positive correlation with the corresponding second similarity;
and according to the second weights of the plurality of conversation turns, performing weighted fusion processing on the first feature vectors of the plurality of conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification.
In another possible implementation manner, before performing weighted fusion processing on the first feature vectors of the multiple dialog turns according to the similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier, the method further includes:
and performing weighted fusion processing on the first feature vectors of the plurality of conversation turns according to the third similarity between the first feature vector of any conversation turn and the first feature vectors of the plurality of conversation turns, and taking the feature vector after the fusion processing as the adjusted first feature vector of any conversation turn.
In another possible implementation manner, performing weighted fusion processing on the word vector set of the multiple dialog turns according to a third similarity between the first feature vector of any dialog turn and the first feature vectors of the multiple dialog turns, and taking the feature vector after the fusion processing as the adjusted first feature vector of any dialog turn includes:
determining a third weight of each dialogue turn in the plurality of dialogue turns according to a third similarity between the first feature vector of any dialogue turn and the first feature vectors of the plurality of dialogue turns, wherein the third weight corresponding to each dialogue turn in the plurality of dialogue turns is in positive correlation with the corresponding third similarity;
and performing weighted fusion processing on the first feature vectors of the plurality of conversation turns according to the third weights corresponding to the plurality of conversation turns, and taking the feature vectors after the fusion processing as the first feature vectors after any conversation turn is adjusted.
In another possible implementation manner, before performing weighted fusion processing on the first feature vectors of the multiple dialog turns according to the similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier, the method further includes:
acquiring a position vector of each conversation turn, wherein the position vector of each conversation turn is used for expressing the position of each conversation turn in a plurality of conversation turns;
and carrying out fusion processing on the first feature vector of each conversation turn and the corresponding position vector, and taking the feature vector after the fusion processing as the adjusted first feature vector of each conversation turn.
In another possible implementation manner, the fusion processing of the second fusion feature vector and the first feature vector of the target dialog turn to obtain a first fusion feature vector corresponding to the preset dimension identifier includes:
determining a fourth weight of the first feature vector of the target conversation turn and a fifth weight of the second fusion feature vector, wherein the sum of the fourth weight and the fifth weight is 1;
and performing weighted fusion processing on the second fusion feature vector and the first feature vector of the target dialogue turn according to the fourth weight and the fifth weight to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In another possible implementation manner, determining at least one target dimension identifier corresponding to a target dialog turn and a target keyword corresponding to each target dimension identifier according to a similarity between a word vector of a preset keyword corresponding to each preset dimension identifier and a corresponding first fusion feature vector, includes:
for each preset dimension identification, determining fourth similarity between word vectors of a plurality of preset keywords corresponding to the preset dimension identification and the first fusion feature vector;
selecting alternative keywords from a plurality of preset keywords, wherein the fourth similarity corresponding to the alternative keywords is greater than the fourth similarity corresponding to other preset keywords in the plurality of preset keywords;
and in response to the fact that the alternative keywords are not the empty keywords, determining the preset dimension identification as a target dimension identification, and determining the alternative keywords as the target keywords.
In another possible implementation manner, obtaining a word vector set of each dialog turn according to the dialog statement of each dialog turn includes:
calling a word vector acquisition submodel in the conversation state determination model, and acquiring a word vector set of each conversation turn according to a conversation sentence of each conversation turn;
according to the similarity between the dimension vector of each preset dimension identification and each word vector in the word vector set of multiple conversation turns, carrying out weighted fusion processing on the word vector set of multiple conversation turns to obtain a first fusion characteristic vector corresponding to each preset dimension identification, and the method comprises the following steps:
calling a feature vector in a conversation state determination model to obtain a sub-model, and performing weighted fusion processing on a word vector set of a plurality of conversation turns according to the similarity between the dimension vector of each preset dimension identification and each word vector in the word vector set of the plurality of conversation turns to obtain a first fusion feature vector corresponding to each preset dimension identification;
determining at least one target dimension identifier corresponding to the target conversation turn and a target keyword corresponding to each target dimension identifier according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion feature vector; determining at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a conversation state of a target conversation turn, wherein the conversation state comprises the following steps:
calling a dialog state determination sub-model in the dialog state determination model, and determining at least one target dimension identifier corresponding to a target dialog turn and a target keyword corresponding to each target dimension identifier according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion feature vector; and determining at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn.
In another possible implementation manner, invoking a feature vector obtaining sub-model in the dialog state determination model, performing weighted fusion processing on the word vector sets of multiple dialog turns according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of multiple dialog turns, and obtaining a first fusion feature vector corresponding to each preset dimension identifier, includes:
calling a feature vector to obtain a first attention layer in the submodel, and for each preset dimension identification, respectively carrying out weighted fusion processing on a plurality of word vectors in the word vector set of each dialogue turn according to the similarity between the dimension vector of the preset dimension identification and the plurality of word vectors in the word vector set of each dialogue turn to obtain a first feature vector of each dialogue turn;
calling the feature vectors to obtain a second attention layer in the submodel, and performing weighted fusion processing on the first feature vectors of the multiple conversation turns according to the similarity between the dimension vectors of the preset dimension identification and the first feature vectors of the multiple conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification;
and calling the feature vector to obtain a feature fusion layer in the submodel, and carrying out fusion processing on the second fusion feature vector and the first feature vector of the target dialogue turn to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In another possible implementation manner, a second attention layer in the feature vector acquisition sub-model is called, and before the weighted fusion processing is performed on the first feature vectors of the multiple dialog turns according to the similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier, the method further includes:
calling a third attention layer in the feature vector acquisition submodel, carrying out weighted fusion processing on the word vector set of the multiple conversation turns according to third similarity between the first feature vector of any conversation turn and the first feature vectors of the multiple conversation turns, and taking the feature vectors after the fusion processing as the first feature vectors after the conversation turns are adjusted.
In another possible implementation manner, a second attention layer in the feature vector acquisition sub-model is called, and before the weighted fusion processing is performed on the first feature vectors of the multiple dialog turns according to the similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier, the method further includes:
calling a third attention layer in the feature vector acquisition submodel to acquire a position vector of each conversation turn, wherein the position vector of each conversation turn is used for expressing the positions of the conversation turns in a plurality of conversation turns; and carrying out fusion processing on the first feature vector of each conversation turn and the corresponding position vector, and taking the feature vector after the fusion processing as the adjusted first feature vector of each conversation turn.
In another possible implementation, the method further includes:
obtaining sample conversation sentences of a plurality of sample conversation turns and a sample conversation state corresponding to each sample conversation turn, wherein the sample conversation sentences of each sample conversation turn comprise at least one of sample user question sentences or sample reply sentences, and the plurality of sample conversation turns comprise a target sample conversation turn and at least one historical sample conversation turn before the target sample conversation turn;
calling a word vector acquisition sub-model, and acquiring a word vector set of each sample conversation turn according to a sample conversation sentence of each sample conversation turn, wherein the word vector set of the sample conversation turn comprises a word vector of each word in the sample conversation sentence of the sample conversation turn;
calling a feature vector acquisition sub-model, and performing weighted fusion processing on the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn according to the similarity between the dimension vector of each preset dimension identification and each word vector in the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn to obtain a first sample fusion feature vector corresponding to each preset dimension identification;
calling a conversation state determining sub-model, and determining at least one prediction dimension identifier corresponding to a target sample conversation turn and a prediction keyword corresponding to each prediction dimension identifier according to word vectors of a plurality of preset keywords corresponding to each preset dimension identifier and corresponding first sample fusion feature vectors; determining at least one prediction dimension identifier and a prediction keyword corresponding to each prediction dimension identifier as a prediction conversation state of a target sample conversation turn;
and adjusting the word vector acquisition sub-model, the characteristic vector acquisition sub-model and the dialogue state determination sub-model according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state.
In another possible implementation manner, after obtaining sample dialog statements of a plurality of sample dialog turns and a sample dialog state corresponding to each sample dialog turn, the method further includes:
any sample conversation turn in the plurality of sample conversation turns is used as a target sample conversation turn, and a sample conversation turn before the any sample conversation turn is used as a historical sample conversation turn of the target sample conversation turn.
In another possible implementation manner, before adjusting the word vector acquisition sub-model, the feature vector acquisition sub-model, and the dialogue state determination sub-model according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state, the method includes:
obtaining a plurality of sample change probabilities of each preset dimension identifier, wherein the sample change probabilities represent the difference between sample conversation states corresponding to every two adjacent sample conversation turns in the plurality of sample conversation turns;
for each preset dimension identification, calling a state conversion model, and respectively processing the first fusion feature vector of each two adjacent sample conversation turns corresponding to the preset dimension identification to obtain a plurality of predicted change probabilities corresponding to the preset dimension identification, wherein the plurality of sample change probabilities corresponding to the same preset dimension identification correspond to the plurality of predicted change probabilities one by one;
adjusting the word vector acquisition sub-model, the feature vector acquisition sub-model and the dialogue state determination sub-model according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state, and the method comprises the following steps:
and adjusting the word vector acquisition submodel, the feature vector acquisition submodel, the dialogue state determination submodel and the calling state conversion model according to the obtained multiple prediction change probabilities of each preset dimension identifier, the multiple sample change probabilities of each preset dimension identifier, the prediction dialogue state of each sample dialogue turn and the corresponding sample dialogue state.
Fig. 2 is a flowchart of a dialog state determination method provided in an embodiment of the present application, and is applied to a computer device, as shown in fig. 2, the method includes:
201. a computer device obtains dialog statements for a plurality of dialog turns.
The dialogue sentences of each dialogue turn comprise at least one of question sentences or reply sentences, and the dialogue turns comprise target dialogue turns and at least one historical dialogue turn located before the target dialogue turns.
In a possible implementation manner, the computer device is a server, the server is a background server of the dialog system, and the terminal runs the dialog system, then the step 201 may include: and the terminal sends the target user identification to a server, and the server inquires the conversation sentences of a plurality of conversation turns corresponding to the target user identification from the conversation sentence library according to the target user identification.
And the target user identification is the user identification for logging in the terminal. The dialogue sentence library is used for storing interactive dialogue sentences between the dialogue system and the user, and the dialogue sentences and the user identifications are correspondingly stored in the dialogue sentence library. And inquiring the corresponding dialog sentences of a plurality of dialog turns from the dialog sentence library through the target user identification.
202. And the computer equipment acquires the word vector set of each conversation turn according to the conversation sentences of each conversation turn.
The word vector set of the conversation turn comprises a word vector of each word in the conversation sentence of the conversation turn.
203. For each preset dimension identification, the computer device performs weighted fusion processing on the word vectors in the word vector set of each conversation turn respectively according to the similarity between the dimension vector of the preset dimension identification and the word vectors in the word vector set of each conversation turn, and obtains a first feature vector of each conversation turn.
The first feature vector is used for representing the feature after fusion of a plurality of word vectors of the corresponding conversation turns.
For any preset dimension identification and any conversation turn, the word vector set of the conversation turn comprises a plurality of word vectors, the weight of each word vector in the word vector set of the conversation turn is determined according to the similarity between the dimension vector of the preset dimension identification and the word vectors in the word vector set of the conversation turn, and the plurality of word vectors in the word vector set of the conversation turn are subjected to weighting fusion processing according to the weight of each word vector and serve as first feature vectors of the conversation turn corresponding to the preset dimension identification.
The computer device obtains the word vector sets of the multiple conversation turns, so that the first feature vector of each conversation turn corresponding to the preset dimension identifier can be obtained, and the first feature vector of each conversation turn corresponding to each preset dimension identifier can be obtained when the computer device obtains the multiple preset dimension identifiers. Because the dimension vectors of different preset dimension identifiers are different, the similarity between the dimension vectors of different preset dimension identifiers and the same word vector may be different, the obtained weight of the same word vector may also be different, and the first feature vectors corresponding to different preset dimension identifiers may also be different for the same conversation turn.
For example, the plurality of conversation turns include a conversation turn 1, a conversation turn 2 and a conversation turn 3, the plurality of preset dimension identifiers include a preset dimension identifier a, a preset dimension identifier B and a preset identifier dimension C, the first eigenvector of the conversation turn 1 corresponding to the preset dimension identifier a, the first eigenvector of the conversation turn 2 corresponding to the preset dimension identifier a, the first eigenvector of the conversation turn 3 corresponding to the preset dimension identifier a, the first eigenvector of the conversation turn 1 corresponding to the preset dimension identifier B, the first eigenvector of the conversation turn 2 corresponding to the preset dimension identifier B, the first eigenvector of the conversation turn 3 corresponding to the preset dimension identifier B, the first eigenvector of the conversation turn 1 corresponding to the preset dimension identifier C, the first eigenvector of the conversation turn 2 corresponding to the preset dimension identifier C, the first eigenvector of the conversation turn 3 corresponding to the preset dimension identifier C, a total of 9 first eigenvectors are obtained, and the 9 first eigenvectors may be different.
In one possible implementation, the step 203 may include:
2031. and for each preset dimension identification and each conversation turn, respectively determining a first similarity between the dimension vector of the preset dimension identification and each word vector in the word vector set of the conversation turn.
The first similarity is used for representing the similarity between the dimension vector and the word vector. For any word vector and the dimension vector of the preset dimension identifier, the first similarity can be determined by using the euclidean distance, the cosine distance and the like.
2032. And determining the first weight of each word vector according to the corresponding first similarity of each word vector in the word vector set of the conversation turn.
The first weight is used for representing the influence degree of the corresponding word vector on the feature vector of the dialogue turn, and the larger the first weight is, the larger the influence degree is, and the smaller the first weight is, the smaller the influence degree is. The first weight of each word vector is in positive correlation with the corresponding first similarity, the larger the first similarity of the word vector is, the larger the first weight of the word vector is, and the smaller the first similarity of the word vector is, the smaller the first weight of the word vector is.
In one possible implementation, this step 2032 may comprise: determining the sum of first similarities corresponding to a plurality of word vectors in the word vector set of the conversation turn as a total similarity, and taking the ratio of the first similarity corresponding to any word vector in the word vector set of the conversation turn to the total similarity as a first weight of the word vector. And the sum of the first weights corresponding to the word vectors in the word vector set of the conversation turn is 1.
It should be noted that, in the embodiment of the present application, the first weight of each word vector is determined according to the first similarity corresponding to each word vector in the word vector set of the dialog turn, and in another embodiment, the following steps may be performed to determine the first weight of each word vector without performing step 2031 and 2032: and regarding any preset dimension identification and any conversation turn, taking the product of any word vector in the word vector set of the conversation turn and the dimension vector of the preset dimension identification as a first weight corresponding to the word vector.
2033. And carrying out weighted fusion processing on the plurality of word vectors according to the first weights of the plurality of word vectors in the word vector set to obtain a first feature vector of the conversation turn.
And for a plurality of word vectors in the word vector set of the conversation turn, fusing the products of each word vector and the corresponding first weight respectively to obtain a first feature vector of the conversation turn.
In one possible implementation, this step 2033 may include: and according to the first weights of the word vectors in the word vector set, carrying out weighted summation on the word vectors to obtain a first feature vector of the conversation turn. Namely, determining the product of each word vector in the plurality of word vectors and the corresponding first weight, and taking the sum of the products corresponding to the plurality of word vectors as the first feature vector of the dialogue turn.
In one possible implementation, this step 2033 may include: and carrying out weighted average on the word vectors according to the first weights of the word vectors in the word vector set to obtain a first feature vector of the conversation turn. The method comprises the steps of determining the product of each word vector in a plurality of word vectors and the corresponding first weight, determining the sum of the products corresponding to the word vectors, and determining the ratio of the sum of the products and the number of the word vectors as the first feature vector of the dialogue turn.
204. And the computer equipment performs weighted fusion processing on the first feature vectors of the multiple conversation turns according to the similarity between the dimension vector of the preset dimension identification and the first feature vectors of the multiple conversation turns to obtain a second fusion feature vector corresponding to the preset dimension identification.
Wherein the second fused feature vector is used to represent a vector of features of conversational utterances for a plurality of conversational turns.
For any preset dimension identification, determining the weight corresponding to each conversation turn according to the similarity between the dimension vector of the preset dimension identification and the first feature vector of each conversation turn, and performing weighted fusion processing on the first feature vectors of the conversation turns according to the weight of each conversation turn to obtain a second fusion feature vector corresponding to the preset dimension identification.
Because the computer device obtains the plurality of preset dimension identifiers, the second fusion feature vector corresponding to each preset dimension identifier can be obtained. Because the dimension vectors of different preset dimension identifiers are different, the similarity between the dimension vectors of different preset dimension identifiers and the first feature vector of the same dialogue turn may be different, the obtained weight of the same dialogue turn may also be different, and the second fusion feature vectors corresponding to different preset dimension identifiers may also be different.
In one possible implementation, this step 204 may include:
2041. and respectively determining a second similarity between the dimension vector of the preset dimension identification and the first feature vector of each dialogue turn.
And the second similarity is used for representing the degree of similarity between the dimension vector and the first feature vector of the dialogue turn. And for the first feature vector of any conversation turn, determining a second similarity between the preset dimension and the first feature vector of the conversation turn according to the dimension vector of the preset dimension identifier and the first feature vector of the conversation turn. The second similarity can be determined by methods such as Euclidean distance and cosine distance.
2042. And determining a second weight of each conversation turn according to the second similarity corresponding to each conversation turn.
The second weight is used for representing the influence degree of the first feature vector of the corresponding conversation turn on the first fusion feature vectors of the plurality of conversation turns, and the larger the second weight is, the larger the influence degree is, and the smaller the second weight is, the smaller the influence degree is. The second weight of each conversation turn is in positive correlation with the corresponding second similarity, the larger the second similarity corresponding to the conversation turn is, the larger the second weight of the conversation turn is, and the smaller the second similarity corresponding to the conversation turn is, the smaller the second weight of the conversation turn is.
In one possible implementation, this step 2042 may include: and determining the sum of the second similarity corresponding to the plurality of conversation turns as the total similarity, and determining the ratio of the second similarity of any conversation turn to the total similarity as the second weight of the conversation turn. Wherein the sum of the second weights for the plurality of dialog turns is 1.
2043. And according to the second weights of the plurality of conversation turns, performing weighted fusion processing on the first feature vectors of the plurality of conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification.
In one possible implementation, step 2043 may include: and according to the second weights of the plurality of conversation turns, carrying out weighted summation on the first feature vectors of the plurality of conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification. Namely, determining the product of the first feature vector of each conversation turn in the plurality of conversation turns and the corresponding second weight, and taking the sum of the products corresponding to the plurality of conversation turns as the second fusion feature vector corresponding to the preset dimension identifier.
In one possible implementation, step 2043 may include: and according to the second weights of the plurality of conversation turns, carrying out weighted average on the first feature vectors of the plurality of conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification. Namely, determining the product of the first feature vector of each dialogue turn in the plurality of word vectors and the corresponding second weight, determining the sum of the products corresponding to the plurality of dialogue turns, and determining the ratio of the sum of the products and the number of the plurality of dialogue turns as the second fusion feature vector of the preset dimension identifier.
It should be noted that, in this embodiment of the application, for any dimension identifier, after the first feature vector of each dialog turn is determined, according to the dimension vector of the preset dimension identifier, the first feature vector of each dialog turn is directly subjected to weighted fusion processing, so as to obtain the second fused feature vector of the preset dimension identifier for description.
In a possible implementation manner, the process of adjusting the first feature vector of each dialog turn may include the following three manners:
the first mode is as follows: and performing weighted fusion processing on the first feature vectors of the plurality of conversation turns according to the third similarity between the first feature vector of any conversation turn and the first feature vectors of the plurality of conversation turns, and taking the feature vector after the fusion processing as the adjusted first feature vector of any conversation turn.
And the first feature vector after each conversation turn is obtained by performing weighted fusion processing on the first feature vectors before the plurality of conversation turns are adjusted. And respectively acquiring the first feature vector after each conversation turn is adjusted according to the mode. By adjusting each conversation turn, the features of the conversation sentences of other conversation turns are merged into each conversation turn, so that the accuracy of the feature vector of each conversation turn is improved.
In one possible implementation, the first way may include the following steps 1-2:
step 1, determining a third weight of each dialogue turn in the dialogue turns according to a third similarity between the first feature vector of any dialogue turn and the first feature vectors of the dialogue turns.
The third weight corresponding to each of the plurality of conversation turns and the corresponding third similarity are in a positive correlation relationship, the larger the third similarity is, the larger the corresponding third weight is, and the smaller the third similarity is, the smaller the corresponding third weight is.
In one possible implementation, the step 1 may include: and taking the ratio of the third similarity corresponding to any one of the plurality of conversation turns to the sum of the third similarities corresponding to the plurality of conversation turns as the third weight corresponding to any one of the plurality of conversation turns.
And 2, performing weighted fusion processing on the first feature vectors of the plurality of conversation turns according to third weights corresponding to the plurality of conversation turns, and taking the feature vectors after the fusion processing as the first feature vectors after any conversation turn is adjusted.
In one possible implementation, the step 2 may include: and according to the third weights corresponding to the multiple conversation turns, carrying out weighted summation on the first feature vectors of the multiple conversation turns to obtain the first feature vector after any conversation turn is adjusted.
In one possible implementation, the step 2 may include: and according to the third weights corresponding to the plurality of conversation turns, carrying out weighted average on the first feature vectors of the plurality of conversation turns to obtain the first feature vector after any conversation turn is adjusted.
It should be noted that the first manner is only described as performing one adjustment on the first feature vector of each dialog turn, and in another embodiment, after performing one adjustment on the first feature vector of each dialog turn, performing a second adjustment on the first feature vectors of the dialog turns after the first adjustment to obtain the first feature vectors of the dialog turns after the second adjustment, and repeatedly performing the above processes, and when the adjustment number reaches a preset number, performing the above step 204 according to the first feature vectors of the dialog turns after the preset number is adjusted. The adjustment mode in each adjustment process is similar, and is not described herein again.
The second mode is as follows: and acquiring a position vector of each conversation turn, fusing the first feature vector of each conversation turn and the corresponding position vector, and taking the feature vector after the fusion processing as the first feature vector after the adjustment of each conversation turn.
Wherein the position vector of the conversation turn is used to represent the position of the conversation turn in a plurality of conversation turns. The first feature vector of each dialogue turn is fused with the corresponding position vector, so that the relation among a plurality of dialogue turns is enhanced, and the accuracy of the first feature vector is improved.
The third mode is as follows: acquiring a position vector of each conversation turn, fusing the first feature vector of each conversation turn and the corresponding position vector, taking the fused feature vector as the first feature vector after the first adjustment of each conversation turn, performing weighted fusion processing on the first feature vector after the first adjustment of a plurality of conversation turns according to the third similarity between the first feature vector after the first adjustment of any conversation turn and the first feature vector after the first adjustment of the plurality of conversation turns, and taking the feature vector after the fused processing as the first feature vector after the second adjustment of any conversation turn.
In the process of adjusting the first feature vector of each dialogue turn, the first feature vector of each dialogue turn and the corresponding position vector are fused according to the second mode to obtain the adjusted first feature vector of each dialogue turn. And adjusting the first feature vector adjusted by each dialogue turn according to the first feature vector adjusted by each dialogue turn in the first mode to obtain the first feature vector adjusted again.
In addition, after the second-time adjusted first feature vector is obtained, the above process may be repeatedly performed in the first manner, and when the adjustment number reaches the preset number, the step 204 is performed according to the first feature vector of the plurality of dialog turns adjusted by the preset number.
205. And the computer equipment performs fusion processing on the second fusion feature vector and the first feature vector of the target conversation turn to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In the embodiment of the application, in order to obtain the dialogue state of a target dialogue turn, after a second fusion feature vector obtained by fusing dialogue sentences of a plurality of dialogue turns is obtained, in order to enhance the features of the dialogue sentences of the target dialogue turn, the second fusion feature vector is fused with the first feature vector of the target dialogue turn again, and a first fusion feature vector corresponding to the preset dimension is obtained.
In one possible implementation, this step 205 may include:
2051. determining a fourth weight of the first feature vector of the target dialog turn and a fifth weight of the second fused feature vector.
The sum of the fourth weight and the fifth weight is 1, and the fourth weight and the fifth weight may be manually set or obtained through model training.
2052. And performing weighted fusion processing on the second fusion feature vector and the first feature vector of the target dialogue turn according to the fourth weight and the fifth weight to obtain a first fusion feature vector corresponding to the preset dimension identifier.
When the second fused feature vector and the first feature vector of the target dialog turn are subjected to weighted fusion processing, the second fused feature vector and the first feature vector of the target dialog turn may be subjected to weighted summation, or the second fused feature vector and the first feature vector of the target dialog turn may be subjected to weighted average.
It should be noted that, in the embodiment of the present application, the first feature vector of each dialog turn is obtained first, and then the first feature vector of each dialog turn is subjected to weighted fusion processing to obtain the first fusion feature vector for description, but in another embodiment, the step 203 and the step 205 need not be executed, and the word vector set of multiple dialog turns may be subjected to weighted fusion processing according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of multiple dialog turns to obtain the first fusion feature vector corresponding to each preset dimension identifier.
206. And the computer equipment determines at least one target dimension identifier corresponding to the target conversation turn and a target keyword corresponding to each target dimension identifier according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion characteristic vector.
The preset keywords are keywords belonging to a preset dimension, are used for describing the preset dimension, and may be preset. In the embodiment of the present application, each preset dimension identifier may have one or more preset keywords. For example, the preset keywords corresponding to the hotel name dimension identifier may be: hotel a, hotel B, hotel C, etc.
The method comprises the steps that first fusion characteristic vectors corresponding to a plurality of preset dimension identifications are obtained, at least one target dimension identification is determined from the plurality of preset dimension identifications according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identification and the corresponding first fusion characteristic vector, and the target keyword corresponding to each target dimension identification is determined from the plurality of preset keywords corresponding to each target dimension identification.
In one possible implementation, this step 206 may include the following steps 2061-2063:
2061. and for each preset dimension identification, determining fourth similarity between word vectors of a plurality of preset keywords corresponding to the preset dimension identification and the first fusion characteristic vector.
And the fourth similarity is used for representing the similarity between the word vector of the preset keyword and the first fusion feature vector.
2062. Selecting alternative keywords from a plurality of preset keywords.
And the fourth similarity corresponding to the alternative keywords is greater than the fourth similarity corresponding to other preset keywords in the plurality of preset keywords.
Optionally, for any preset dimension identifier, determining similarity between each preset keyword corresponding to the preset dimension identifier and a first fusion feature vector corresponding to the preset dimension identifier, and selecting a preset keyword with the maximum similarity to the first fusion feature vector from a plurality of preset keywords corresponding to the preset dimension identifier as the candidate keyword, thereby obtaining a plurality of preset dimension identifiers and candidate keywords corresponding to each preset dimension identifier.
2063. And in response to the fact that the alternative keywords are not the empty keywords, determining the preset dimension identification as a target dimension identification, and determining the alternative keywords as the target keywords.
The empty keyword is used for representing that the first fusion feature vector of the corresponding preset dimension identifier is not matched with the preset dimension identifier, namely the feature of the dialog statement of the target dialog turn is not matched with the preset dimension identifier.
In the embodiment of the application, each of the plurality of preset keywords corresponding to each preset dimension identifier includes a null keyword, when any one of the alternative keywords corresponding to the preset dimension identifier is a null keyword, the null keyword indicates that the preset dimension identifier is not matched with the features of the dialog sentences of the target dialog turn, and when any one of the alternative keywords corresponding to the preset dimension identifier is not a null keyword, the null keyword indicates that the alternative keyword corresponding to the preset dimension identifier is matched with the features of the dialog sentences of the target dialog turn.
Selecting a preset dimension identifier, which is not a null keyword, from the plurality of preset dimension identifiers through the determined alternative keywords corresponding to the plurality of preset dimension identifiers, as a target dimension identifier, and using the alternative keyword corresponding to the target dimension identifier as a target keyword.
207. And the computer equipment determines at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn.
The dialog state is used for representing a dialog topic contained in the dialog statement. For example, the dialog states for the target dialog turn include: hotel name dimension identification and hotel name A, and restaurant name dimension identification and restaurant name B.
And taking the determined at least one target dimension identifier and the target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn, indicating that the conversation sentence in the target conversation turn is related to the target dimension, and information contained in the conversation sentence contains the target keyword, so that the intention of the target conversation turn can be determined according to the conversation state of the target conversation turn subsequently, and further a reply sentence is generated for the user.
It should be noted that, in the embodiment of the present application, a dialog state in which a target dialog turn is obtained is described, but in another embodiment, when all candidate keywords identified by preset dimensions are null keywords, a dialog state in which the target dialog turn is not obtained is determined, and in response to a dialog state in which the target dialog turn is not obtained, a preset question statement is sent to a user; or, in response to the fact that the conversation state of the target conversation turn is not acquired, recommending any preset dimension identifier in the preset dimension identifiers to the user so as to prompt the user to input the information of the preset dimension identifier.
The method includes the steps of obtaining dialogue sentences of a plurality of dialogue turns, obtaining a word vector set of each dialogue turn, conducting weighted fusion processing on the word vector sets of the dialogue turns according to the dimension vector of each preset dimension identification to obtain a first fusion characteristic vector corresponding to each preset dimension identification, determining at least one target dimension identification corresponding to a target dialogue turn and a target keyword corresponding to each target dimension identification, and determining at least one target dimension identification and the target keyword corresponding to each target dimension identification as dialogue states of the target dialogue turns. Through the fusion of the dialogue sentences of the target dialogue turns and the historical dialogue turns, the information quantity of the dialogue sentences is enriched, the weighted fusion processing is carried out on the word vector sets of the dialogue turns according to each preset dimension identification, the accuracy of the first fusion characteristic vector of each preset dimension identification is improved, and therefore the accuracy of the acquired dialogue state is improved.
And for any preset dimension identification, the first feature vector of each conversation turn is adjusted according to the first feature vectors of the conversation turns, so that the features of conversation sentences of other conversation turns are blended into the adjusted first feature vector of each conversation turn, the relationship among the conversation turns is enhanced, the accuracy of the adjusted feature vector of each conversation turn is improved, and the accuracy of the determined conversation state is improved.
And when the first feature vector of each conversation turn is adjusted, the position vector of each conversation turn is fused, so that the contextual connection among a plurality of conversation turns is enhanced, and the accuracy of the determined conversation state is improved.
On the basis of the embodiment shown in fig. 2, the dialog statements of multiple dialog turns may also be processed by the dialog state determination model to obtain the dialog state of the target dialog turn, and the specific process is described in the following embodiments.
Fig. 3 is a flowchart of a dialog state determination method provided in an embodiment of the present application, and is applied to a computer device, as shown in fig. 3, the method includes:
301. a computer device obtains dialog statements for a plurality of dialog turns.
This step is similar to step 201 described above and will not be described herein again.
302. And the computer equipment calls a word vector obtaining sub-model in the conversation state determining model and obtains a word vector set of each conversation turn according to the conversation sentence of each conversation turn.
In the embodiment of the application, the dialog statements of a plurality of dialog turns can be processed through the dialog state determination model, so that the dialog state of the target dialog turn is obtained. The dialogue state determination model comprises a word vector acquisition sub-model, a feature vector acquisition sub-model and a dialogue state determination sub-model, wherein the feature vector acquisition sub-model can comprise a first attention layer, a second attention layer, a third attention layer and a feature fusion layer.
The word vector obtaining sub-model is used for obtaining a word vector set of each conversation turn. The dialogue sentences of each dialogue turn are input into the word vector acquisition submodel, and the word vector acquisition submodel respectively processes the dialogue sentences of each dialogue turn, so that a word vector set of each dialogue turn is obtained.
In addition, since the dialogue sentences in each dialogue turn include at least one of question sentences and reply sentences, when the dialogue sentences in each dialogue turn are input to the feature vector acquisition submodel, sentence whole characters can be added to the initial position of the dialogue sentences, and sentence separators can be added between the question sentences and the reply sentences, so that the feature vector acquisition submodel can accurately obtain the word vector set in each dialogue turn.
For example, the dialog sentences of any dialog turn include question sentence 1 and reply sentence 2, the whole characters of the sentences are [ CLS ], the sentence separator is [ SEP ], and the sentence input to the feature vector acquisition submodel is "[ CLS ] question sentence 1[ SEP ] reply sentence 2[ SEP ]", or the sentence input to the feature vector acquisition submodel is "[ CLS ] reply sentence 2[ SEP ] question sentence 1[ SEP ]". When the dialogue sentences of the dialogue turn only include question sentences 1, the sentences input to the feature vector acquisition submodel are "[ CLS ] [ SEP ] question sentences 1[ SEP ]", or the sentences input to the feature vector acquisition submodel are "[ CLS ] question sentences 1[ SEP ] [ SEP ]".
In one possible implementation, this step 302 may include: calling a word vector to obtain a sub-model BERT1Question sentences R included in the dialogue sentences according to the tth dialogue turntAnd a reply statement UtObtaining word vector set h of the t-th dialogue turntThe question sentence RtAnd reply statement UtAnd the word vector set h of the t-th dialogue turntThe following relationship is satisfied:
ht=BERT1([Rt,Ut]),
among them, BERT (Bidirectional Encoder models) is used to represent a word vector acquisition submodel.
303. And the computer equipment calls a first attention layer in the feature vector acquisition submodel, and for each preset dimension identification, respectively carries out weighted fusion processing on a plurality of word vectors in the word vector set of each dialogue turn according to the similarity between the dimension vector of the preset dimension identification and the plurality of word vectors in the word vector set of each dialogue turn to obtain a first feature vector of each dialogue turn.
The feature vector acquisition submodel comprises a first attention layer, wherein the first attention layer is used for comprehensively considering a plurality of word vectors in a word vector set of each conversation turn according to the dimension vector of a preset dimension identifier to acquire a first feature vector of each conversation turn.
Through the first attention layer, an attention mechanism is introduced, each dimension vector of a preset dimension and the word vector set of each dialogue turn are input into the first attention layer, the first attention layer performs weighting fusion processing on a plurality of word vectors in the word vector set according to the similarity between the dimension vector of the preset dimension identification and each word vector in the word vector set, a first feature vector of the dialogue turn corresponding to the preset dimension identification is obtained, and the accuracy of the obtained first feature vector is improved.
In addition, for the way of obtaining the dimension vector of each preset dimension identifier, the dimension vector of each dimension identifier may be obtained through other feature vector obtaining models, or the dimension vector of each preset dimension identifier may be stored in the database in advance, so that when the dimension vector of each preset dimension identifier is used, the dimension vector can be directly obtained by querying from the database.
Optionally, the feature vector acquisition model BERT is invoked2For any one preset dimension mark s, obtaining a dimension vector h of the preset dimension mark ssThe dimension vector h of the preset dimension mark ssThe following relationship is satisfied:
hs=BEST2(s)。
optionally, a first attention layer is called, and for any preset dimension identifier s, a dimension vector h of the preset dimension identifier s is obtained
sAnd the set h of word vectors for the tth dialog turn
tObtaining the first feature vector of the t-th dialogue turn
Then the dimension vector h of the preset dimension mark s
sAnd the word vector set h of the tth conversation turn
tAnd the first feature vector of the t-th dialogue turn
The following relationship is satisfied:
where Multihead () represents the multi-head attention function of the first attention layer.
304. And the computer equipment calls the feature vectors to obtain a second attention layer in the submodel, and performs weighted fusion processing on the first feature vectors of the multiple conversation turns according to the similarity between the dimension vectors of the preset dimension identifiers and the first feature vectors of the multiple conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identifiers.
The feature vector acquisition sub-model comprises a second attention layer, and the second attention layer is used for comprehensively considering the first feature vectors of the multiple conversation turns according to the dimension vectors of the preset dimension identification and acquiring the first fusion feature vectors after the multiple conversation turns are fused.
Through the second attention layer, an attention mechanism is introduced, for any preset dimension identification, according to the similarity between the dimension vector of the preset dimension identification and the first feature vector of each conversation turn, the first feature vectors of a plurality of conversation turns are subjected to weighted fusion processing, a second fusion feature vector corresponding to the preset dimension identification is obtained, and the accuracy of the obtained second fusion feature vector is improved.
In a possible implementation manner, the second attention layer is called, and for any preset dimension identifier s, the dimension vector h of the preset dimension identifier s
sWith the first feature vector set of multiple dialog turns
Second fusion feature vector corresponding to preset dimension identification s
The following relationship is satisfied:
where Multihead () represents the multi-head attention function of the first attention layer,
a first set of feature vectors for a plurality of dialog turns, the first feature vector of each dialog turn serving as a dimension of the first set of feature vectors.
As shown in fig. 4, different colors of each word correspond to different weights, and the weights sequentially decrease from large to small: color 3, color 2, color 1, color 0. And determining a conversation turn 1 through multi-head attention, wherein the multi-head attention is different, and a second attention layer fuses the conversation turn 1 and a conversation turn 2 according to the similarity between the first feature vector of each conversation turn and a preset dimension identifier through the weight of each conversation turn in the multi-head attention to obtain a second fused feature vector.
It should be noted that, in this embodiment of the application, for any dimension identifier, after the first feature vector of each dialog turn is determined, according to the dimension vector of the preset dimension identifier, the first feature vector of each dialog turn is directly subjected to weighted fusion processing, so as to obtain the second fused feature vector of the preset dimension identifier for description.
In one possible implementation manner, for any dimension identification, the process of adjusting the first feature vector of each dialog turn may include the following three manners:
the first mode is as follows: calling a third attention layer in the feature vector acquisition submodel, carrying out weighted fusion processing on the word vector set of the multiple conversation turns according to third similarity between the first feature vector of any conversation turn and the first feature vectors of the multiple conversation turns, and taking the feature vectors after the fusion processing as the first feature vectors after the conversation turns are adjusted.
And calling the adjusted first feature vector of each dialogue turn acquired by the third attention layer, wherein the adjusted first feature vector is obtained by performing weighted fusion processing on the first feature vectors before the adjustment of the dialogue turns. And respectively acquiring the first feature vector after each conversation turn is adjusted according to the mode. By adjusting each conversation turn, the features of the conversation sentences of other conversation turns are merged into each conversation turn, so that the accuracy of the feature vector of each conversation turn is improved.
It should be noted that the first way is only to invoke the third attention layer to perform one adjustment on the first feature vector of each dialog turn, and in another embodiment, the third attention layer may include a plurality of attention units, each attention unit is used to perform one adjustment on the first feature vector of each dialog turn, an output of a previous attention unit is used as an input of a subsequent attention unit, the first feature vectors of the dialog turns are adjusted a plurality of times through the plurality of attention units included in the third attention layer, and then the step 304 is executed according to the adjusted first feature vectors of the dialog turns output by the last attention unit. The adjustment manner of the attention unit is similar, and is not described herein again.
The second mode is as follows: calling a third attention layer in the feature vector acquisition submodel to acquire a position vector of each conversation turn, wherein the position vector of each conversation turn is used for expressing the positions of the conversation turns in a plurality of conversation turns; and carrying out fusion processing on the first feature vector of each conversation turn and the corresponding position vector, and taking the feature vector after the fusion processing as the adjusted first feature vector of each conversation turn.
The third mode is as follows: calling a third attention layer in the feature vector acquisition submodel, acquiring a position vector of each dialogue turn, carrying out fusion processing on a first feature vector of each dialogue turn and a corresponding position vector, taking the feature vector after the fusion processing as the first feature vector after the first adjustment of each dialogue turn, carrying out weighted fusion processing on the first feature vectors after the first adjustment of a plurality of dialogue turns according to the third similarity between the first feature vector after the first adjustment of any dialogue turn and the first feature vectors after the first adjustment of the plurality of dialogue turns, and taking the feature vector after the fusion processing as the first feature vector after the second adjustment of any dialogue turn.
The third attention layer may include two attention units, the first feature vector of each dialogue turn and the position vector of each dialogue turn are input to the first attention unit, the first attention unit blends the corresponding position vector into the first feature vector of each dialogue turn, the output adjusted first feature vector of each dialogue turn is input to the second attention unit, and the second attention unit adjusts the adjusted first feature vector of each dialogue turn again and outputs the adjusted first feature vector of each dialogue turn.
In addition, the third attention layer may further include a plurality of attention units, each attention unit being configured to perform one adjustment on the first feature vector of each dialog turn, and an output of a previous attention unit being used as an input of a subsequent attention unit. The first attention unit is configured to perform fusion adjustment on the first feature vector of each dialogue turn and the corresponding position vector, the second attention unit is configured to perform adjustment again on the adjusted first feature vector of each dialogue turn, each attention unit after the second attention unit respectively performs adjustment again on the adjusted first feature vectors of the multiple dialogue turns output by the previous attention unit, so as to obtain a multiple-time adjusted first feature vector of each dialogue turn, and then the above step 304 is performed according to the adjusted first feature vectors of the multiple dialogue turns output by the last attention unit. Wherein each attention unit may include two canonical sublayers, a feedforward neural network sublayer and a multi-headed self-attention sublayer.
In a possible implementation manner, the third attention layer may further include X attention units, and for any preset dimension identifier s, fusion adjustment is performed on the first feature vector of each dialog turn and the corresponding position vector through the first attention unit of the third attention layer, so as to obtain a 1 st first feature vector set m after first adjustment of multiple dialog turns0The first characteristic vector is collected into m0Inputting the first feature vector into the second attention unit of the third attention layer, adjusting the first feature vector after the first adjustment of the plurality of dialogue turns again, and adjusting the first feature vector of the plurality of dialogue turns again through the nth attention unit of the third attention layer to obtain the nth first feature vector set m after the adjustment of the plurality of dialogue turnsnAnd adjusting the first feature vectors of the plurality of dialogue turns again through the Xth attention unit of the third attention layer to obtain the Xth first feature vector set m after the plurality of dialogue turns are adjustedX。
1 st first set of eigenvectors m0The following relationship is satisfied:
wherein,
a first feature vector representing the 1 st dialogue turn corresponding to the preset dimension identification s,
a first feature vector representing the 2 nd dialog turn corresponding to the preset dimension identifier s,
the first feature vector of the tth dialogue turn corresponding to the preset dimension identification s is represented, the PE (1) represents the position vector of the 1 st dialogue turn, the PE (2) represents the position vector of the 2 nd dialogue turn, and the PE (t) represents the position vector of the tth dialogue turn.
N-th first set of eigenvectors mnThe following relationship is satisfied:
mn=FFN(Multihead(mn-1,mn-1,mn-1))
FFN(x)=max(0,xW1+b1)W2+b2
wherein n isN is a positive integer not less than 1, m is the nth attention unit of the X attention units representing the third attention layern-1Represents the n-1 st set of first feature vectors, Multihead () represents the multi-head attention function of the third attention layer, FFN () represents the transformation function in the third attention layer, max represents the maximum, x is an arbitrary unknown, W is the maximum1、b1、W2、b2All are adjustment parameters and can be any constant.
The Xth first set of eigenvectors mXThe following relationship is satisfied:
wherein X represents the total number of the attention units included in the third attention layer, X is a positive integer not less than 1,
and representing a first feature vector set after a plurality of conversation turns corresponding to the preset dimension identification s are updated, wherein the first feature vector of each conversation turn is used as one dimension of the first feature vector set.
305. And calling the feature vector by the computer equipment to acquire a feature fusion layer in the sub-model, and fusing the second fusion feature vector and the first feature vector of the target dialogue turn to obtain a first fusion feature vector corresponding to the preset dimension identifier.
The feature vector acquisition submodel comprises a feature fusion layer, and the feature fusion layer is used for fusing a second fusion feature vector of a plurality of turns with a first feature vector of a target conversation turn.
In one possible implementation, this
step 305 may include: calling a feature fusion layer to determine a first feature vector of a target conversation turn
Fourth weight g of
s,tAnd a fifth weight of the second fused feature vector of 1-g
s,tAccording to a fourth weight g
s,tAnd a fifth weight of 1-g
s,tFor the second fused feature vector
First feature vector of dialogue turn with target
Performing weighted fusion processing to obtain a first fusion feature vector corresponding to a preset dimension identifier s
The first fusion feature vector corresponding to the preset dimension identifier s
First feature vector
Second fused feature vector
Fourth weight g
s,tAnd a fifth weight of 1-g
s,tThe following relationship is satisfied:
where σ () represents a sigmoid (logistic regression) activation function, W
gDenotes an adjustment parameter, which may be any constant, W
gBelongs to a matrix of 2d rows and d columns, d is a positive integer no less than 1, ⊙ denotes that vectors are multiplied by dimension,
the representative vectors are multiplied by a dot product.
It should be noted that, in the present application, the first fusion feature vector is obtained by the first attention layer, the second attention layer and the feature fusion layer, but in another embodiment, the feature vector obtaining sub-model may be invoked without performing step 303 and step 305, and the first fusion feature vector corresponding to each preset dimension identifier is obtained by adopting other manners.
306. And the computer equipment calls a dialogue state determination sub-model in the dialogue state determination model, determines at least one target dimension identifier corresponding to the target dialogue turn and a target keyword corresponding to each target dimension identifier according to the similarity between the word vector of the preset keyword corresponding to each preset dimension identifier and the corresponding first fusion characteristic vector, and determines the at least one target dimension identifier and the target keyword corresponding to each target dimension identifier as the dialogue state of the target dialogue turn.
The conversation state determining sub-model is used for obtaining the conversation state of the target conversation turn.
And inputting the word vector of the preset keyword of each preset dimension and the first fusion characteristic vector corresponding to each preset dimension into the conversation state determination sub-model, and outputting the conversation state of the target conversation turn by the conversation state determination sub-model.
In addition, as for the way of obtaining the word vector of the preset keyword corresponding to each preset dimension identifier, other feature vector obtaining models can be called to obtain the word vector of each preset keyword, the word vector of the preset keyword corresponding to each preset dimension identifier can also be stored in the database in advance, and when the word vector of the preset keyword corresponding to each preset dimension identifier is used, the word vector can be directly obtained by querying from the database.
Optionally, other feature vector acquisition models BERT are invoked
2For any preset keyword upsilon corresponding to any preset dimension identification s
sObtaining the preset keyword upsilon
sWord vector of
The preset keywords upsilon
sWord vector of
The following relationship is satisfied:
among them, BERT (Bidirectional Encoder retrieval from transforms) is used to represent a word vector acquisition submodel.
The method includes the steps of obtaining dialogue sentences of a plurality of dialogue turns, obtaining a word vector set of each dialogue turn, conducting weighted fusion processing on the word vector sets of the dialogue turns according to the dimension vector of each preset dimension identification to obtain a first fusion characteristic vector corresponding to each preset dimension identification, determining at least one target dimension identification corresponding to a target dialogue turn and a target keyword corresponding to each target dimension identification, and determining at least one target dimension identification and the target keyword corresponding to each target dimension identification as dialogue states of the target dialogue turns. Through the fusion of the dialogue sentences of the target dialogue turns and the historical dialogue turns, the information quantity of the dialogue sentences is enriched, the weighted fusion processing is carried out on the word vector sets of the dialogue turns according to each preset dimension identification, the accuracy of the first fusion characteristic vector of each preset dimension identification is improved, and therefore the accuracy of the acquired dialogue state is improved.
And for any preset dimension identification, the first feature vector of each conversation turn is adjusted according to the first feature vectors of the conversation turns, so that the features of conversation sentences of other conversation turns are blended into the adjusted first feature vector of each conversation turn, the relationship among the conversation turns is enhanced, the accuracy of the adjusted feature vector of each conversation turn is improved, and the accuracy of the determined conversation state is improved.
And when the first feature vector of each conversation turn is adjusted, the position vector of each conversation turn is fused, so that the contextual connection among a plurality of conversation turns is enhanced, and the accuracy of the determined conversation state is improved.
And moreover, the accuracy of the acquired conversation state is improved through the word vector acquisition sub-model, the feature vector acquisition sub-model and the conversation state determination sub-model which are included in the conversation state determination model.
On the basis of the embodiment shown in fig. 3, before the word vector obtaining sub-model, the feature vector obtaining sub-model, and the dialogue state determining sub-model included in the dialogue state determining model are called, the word vector obtaining sub-model, the feature vector obtaining sub-model, and the dialogue state determining sub-model need to be trained, and the training process is described in the following embodiments.
Fig. 5 is a training method of a dialog state determination model provided in an embodiment of the present application, and is applied to a computer device, as shown in fig. 5, the method includes:
501. the computer device obtains sample conversation statements of a plurality of sample conversation turns and a sample conversation state corresponding to each sample conversation turn.
The sample conversation sentences of each sample conversation turn comprise at least one of sample user question sentences or sample reply sentences, and the multiple sample conversation turns comprise a target sample conversation turn and at least one historical sample conversation turn before the target sample conversation turn. The sample dialog state corresponding to each sample dialog turn is used for representing the dialog topic contained in the sample dialog statement of the sample dialog turn. The sample dialog state corresponding to each sample dialog turn may be obtained by manually labeling sample dialog statements of multiple sample dialog turns.
502. And the computer equipment calls the word vector acquisition sub-model and acquires a word vector set of each sample conversation turn according to the sample conversation sentence of each sample conversation turn.
The word vector obtaining sub-model may be an initialized word vector obtaining sub-model.
The set of word vectors for the sample turn of dialog includes a word vector for each word in the sample dialog statement for the sample turn of dialog. By respectively inputting the sample dialogue sentences of each sample dialogue turn into the word vector acquisition submodel, a word vector set of each sample dialogue turn can be obtained.
503. And calling a first attention layer by the computer equipment, and for each preset dimension identification, respectively carrying out weighted fusion processing on a plurality of word vectors in the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn according to the similarity between the dimension vector of the preset dimension identification and each word vector in the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn to obtain a first feature vector of the target sample conversation turn and a first feature vector of the corresponding historical sample conversation turn.
In this embodiment of the application, the feature vector acquisition sub-model includes a first attention layer, a second attention layer, and a feature fusion layer, and a first sample fusion feature vector corresponding to each preset dimension identifier is acquired through the first attention layer, the second attention layer, and the feature fusion layer.
504. And calling a second attention layer by the computer equipment, and performing weighted fusion processing on the target sample conversation turn and the first feature vector of the corresponding historical sample conversation turn according to the similarity between the dimension vector of the preset dimension identification and the first feature vector of the target sample conversation turn and the corresponding historical sample conversation turn to obtain a second sample fusion feature vector corresponding to the preset dimension identification.
505. And calling a feature fusion layer by the computer equipment, and fusing the second sample fusion feature vector with the first feature vector of the target sample dialogue turn to obtain a first sample fusion feature vector corresponding to the preset dimension identification.
It should be noted that, in the embodiment of the present application, the first sample fusion feature vector corresponding to each preset dimension identifier is obtained through the first attention layer, the second attention layer and the feature fusion layer, but in another embodiment, the feature vector obtaining sub-model may be invoked without executing step 503 and step 505, and the first sample fusion feature vector corresponding to each preset dimension identifier is obtained in other manners.
506. And the computer equipment calls a conversation state determining sub-model, determines at least one prediction dimension identifier corresponding to the target sample conversation turn and a prediction keyword corresponding to each prediction dimension identifier according to the word vectors of the plurality of preset keywords corresponding to each preset dimension identifier and the corresponding first sample fusion feature vector, and determines the at least one prediction dimension identifier and the prediction keyword corresponding to each prediction dimension identifier as the prediction conversation state of the target sample conversation turn.
507. And the computer equipment adjusts the word vector acquisition sub-model, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination sub-model according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state.
The computer equipment can determine loss values of the word vector acquisition sub-model, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination sub-model according to the difference between the predicted dialogue state and the corresponding sample dialogue state, and adjusts the word vector acquisition sub-model, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination sub-model according to the loss values so as to improve the accuracy of determining the sub-model by the word vector acquisition sub-model, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state.
It should be noted that, in the embodiment of the present application, the word vector obtaining sub-model, the first attention layer, the second attention layer, the feature fusion layer, and the dialogue state determination sub-model are trained, but in another embodiment, step 507 is not required to be executed, and the word vector obtaining sub-model, the feature vector obtaining sub-model, and the dialogue state determination sub-model may be adjusted according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state.
It should be noted that, in the embodiment of the present application, the training process is described by taking any one target sample dialog turn as an example, but in another embodiment, training may also be performed according to a plurality of target sample dialog turns.
In one possible implementation, after step 501: any sample conversation turn in the plurality of sample conversation turns is used as a target sample conversation turn, and a sample conversation turn before the any sample conversation turn is used as a historical sample conversation turn of the target sample conversation turn.
By the aid of the obtained sample conversation turns, each sample conversation turn can be respectively used as a target sample conversation turn, so that a plurality of sample conversation turn sets can be obtained, and each sample conversation turn set comprises one target sample conversation turn and a corresponding historical sample conversation turn.
For example, the multiple sample session rounds obtained include: sample conversation turns 1, sample conversation turns 2, sample conversation turns 3, sample conversation turns 4, sample conversation turns 5, then respectively regard every sample conversation turn as target sample conversation turns, then can obtain 5 sample conversation turns sets, and the first sample conversation turns set includes: sample conversation run 1, the second set of sample conversation runs comprising: sample conversation turns 1, sample conversation turns 2, the third sample conversation turn set includes: sample conversation turn 1, sample conversation turn 2, sample conversation turn 3, the fourth sample conversation turn set includes: sample conversation turn 1, sample conversation turn 2, sample conversation turn 3, sample conversation turn 4, and the fifth sample conversation turn set includes: sample conversation turn 1, sample conversation turn 2, sample conversation turn 3, sample conversation turn 4, and sample conversation turn 5.
In one possible implementation, before step 507, the method further includes the following steps 1 to 2:
step 1, obtaining a plurality of sample change probabilities of each preset dimension identification.
Wherein the sample change probability represents a difference between sample conversation states corresponding to every two adjacent sample conversation turns in the plurality of sample conversation turns. For sample conversation states corresponding to any two adjacent sample conversation turns, if the two sample conversation states are the same, the sample change probability corresponding to the any two sample conversation turns is 0, and if the two sample conversation states are different, the sample change probability corresponding to the any two sample conversation turns is 1.
Because a plurality of sample conversation turns are obtained, and one sample change probability corresponds to every two adjacent sample conversation turns, a plurality of sample change probabilities can be obtained for any preset dimension identification.
For example, the multiple sample session rounds obtained include: the sample conversation turns 1, the sample conversation turns 2, the sample conversation turns 3, the sample conversation turns 4 and the sample conversation turns 5 are respectively used as target sample conversation turns, 4 sample change probabilities can be obtained according to the sample conversation state corresponding to each sample conversation turn, the sample conversation turns 1 and the sample conversation turns 2 correspond to one sample change probability, the sample conversation turns 2 and the sample conversation turns 3 correspond to one sample change probability, the sample conversation turns 3 and the sample conversation turns 4 correspond to one sample change probability, and the sample conversation turns 4 and the sample conversation turns 5 correspond to one sample change probability.
And 2, calling a state conversion model for each preset dimension identification, and respectively processing the first fusion feature vectors of every two adjacent sample conversation turns corresponding to the preset dimension identification to obtain a plurality of prediction change probabilities corresponding to the preset dimension identification.
The state transition model is used for obtaining the prediction change probability according to the first fusion feature vector of the two adjacent sample conversation turns.
For the obtained multiple sample conversation turns, each sample conversation turn is used as a target sample conversation turn, and the step 503 and the step 505 are executed, so that the first fusion feature vector of each sample conversation turn can be obtained. For any preset dimension identifier, a state transition model is called, a prediction change probability can be obtained between every two adjacent sample conversation turns in a plurality of sample conversation turns, then a plurality of preset change probabilities can be obtained for the preset dimension identifier, and the number of the preset change probabilities is equal to the number of the sample conversation turns minus 1. And the sample change probabilities corresponding to the same preset dimension identification correspond to the prediction change probabilities one by one.
Accordingly, this step 507 may include:
and adjusting the word vector acquisition submodel, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination submodel and the calling state conversion model according to the obtained multiple prediction change probabilities of each preset dimension identifier, the multiple sample change probabilities of each preset dimension identifier, the prediction dialogue state of each sample dialogue turn and the corresponding sample dialogue state.
The computer equipment can determine loss values of the word vector acquisition submodel, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination submodel according to the difference between the predicted dialogue state and the corresponding sample dialogue state, the difference between the multiple predicted change probabilities of each preset dimension identification and the multiple sample change probabilities of each preset dimension identification, and adjusts the word vector acquisition submodel, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination submodel according to the loss values so as to improve the accuracy of the word vector acquisition submodel, the first attention layer, the second attention layer, the feature fusion layer and the dialogue state determination submodel.
Optionally, for any preset dimension identifier s, according to a first fusion feature vector corresponding to the preset dimension identifier s and corresponding to the tth sample conversation turn
Sample keywords corresponding to the tth sample conversation turn
Word vector of
Determining a first loss value j corresponding to a preset dimension identifier s and a tth sample conversation turn, wherein the first loss value j meets the following relation:
wherein U represents a question statement in a sample conversation turn, R represents a reply statement in the sample conversation turn, U ≦ t represents a question statement from the first sample conversation turn to the tth sample conversation turn, R ≦ t represents a reply statement from the first sample conversation turn to the tth sample conversation turn,
representing any preset keyword in a plurality of preset keyword sets Z corresponding to the preset dimension identification s, exp representing an exponential function with a natural constant e as a base, | | | | sweet
2Denotes norm, O
s,tRepresenting a first fused feature vector corresponding to a preset dimension identifier s and corresponding to a tth sample conversation turn
The feature vector obtained after the Linear transformation process, Dropout () is used to represent the random selection discard function, Linear () is used to represent the Linear transformation function, and LayerNorm () is used to represent the normalization function.
Correspondingly, according to the preset dimension identifications and the sample conversation turns, the predicted conversation state and the first loss value sum L corresponding to the corresponding sample conversation state are obtaineddstThe following relationship is satisfied:
wherein Y represents a set of a plurality of preset dimension identifications, G represents the total number of a plurality of conversation turns,
representing any sample keyword corresponding to any preset dimension identifier s and corresponding to the tth conversation turn, s representing any preset dimension identifier in the preset dimension identifiers, and U representing the sampleQuestion statements in the dialogue turns, R represents reply statements in the sample dialogue turns, U ≦ t represents question statements from the first sample dialogue turn to the tth sample dialogue turn, and R ≦ t represents reply statements from the first sample dialogue turn to the tth sample dialogue turn.
Optionally, for the preset dimension identifier s, the preset change probability between the tth sample conversation turn and the t-1 sample conversation turn
The following relationship is satisfied:
where σ () represents a sigmoid (logistic regression) activation function, W
pThe adjustment parameter matrix W may be any constant matrix
pIs a matrix of d rows and d columns,
representing a first fused feature vector
The feature vector after the non-linear transformation is carried out,
representing a first fused feature vector
Feature vectors after nonlinear transformation, ⊙ denotes multiplication of vectors by dimension, W
cRepresents a vector of tuning parameters, which may be any constant vector, the tuning parameter matrix W
cFor a column vector of 2d rows, tanh () is used to represent the non-linear transformation function.
Accordingly, according toObtaining a plurality of predicted change probabilities of each preset dimension identification and a plurality of sample change probabilities of each preset dimension identification, and obtaining a second loss value sum L corresponding to the predicted change probabilities and the sample change probabilitiesstpThe following relationship is satisfied:
wherein,
represents a preset variation probability between the t-th sample conversation turn and the t-1 st sample conversation turn,
and the sample change probability between the tth sample conversation turn and the t-1 sample conversation turn is represented, Y represents a set of a plurality of preset dimension identifications, and G represents the total number of the plurality of conversation turns.
Correspondingly, the total loss value L of the model is determined according to the obtained multiple predicted change probabilities of each preset dimension identifier, the multiple sample change probabilities of each preset dimension identifier, the predicted dialogue state of each sample dialogue turn and the corresponding sample dialogue statejointThe following relationship is satisfied:
Ljoint=Ldst+Lstp
wherein L isdstIndicating a first sum of loss values, L, corresponding to the predicted dialog state and the corresponding sample dialog statestpA second sum of loss values representing the predicted change probability corresponding to the sample change probability.
Total loss value L through the determined modeljointAnd adjusting the word vector acquisition sub-model, the first attention layer, the second attention layer, the feature fusion layer, the conversation state determination sub-model and the calling state conversion model.
It should be noted that, in the embodiment of the present application, the feature vector obtaining sub-model includes a first attention layer, a second attention layer, and a feature fusion layer dialog state determining sub-model, and in another embodiment, the feature vector obtaining sub-model may also exist in a separate structure, and the feature vector obtaining sub-model may be directly trained.
According to the method provided by the embodiment of the application, the word vector acquisition sub-model, the feature vector acquisition sub-model and the dialogue state determination sub-model are adjusted through the sample dialogue sentences of the plurality of sample dialogue turns, so that the trained sample dialogue sentences are enriched, and the accuracy of the trained model is improved. In the training process of the word vector obtaining sub-model, the feature vector obtaining sub-model and the dialogue state determining sub-model, the state conversion model is added, and the training is performed through the combination of the state conversion model, the word vector obtaining sub-model, the feature vector obtaining sub-model and the dialogue state determining sub-model, so that the incidence relation among a plurality of sample dialogue turns is enhanced, the capability of the model for obtaining features from historical sample dialogue turns is improved, and the accuracy of the word vector obtaining sub-model, the feature vector obtaining sub-model and the dialogue state determining sub-model is improved.
Table 1 shows the differences between the dialog state determination model obtained by training in the embodiment of the present application and other models in the prior art, as shown in table 1. According to data in the table, the dialogue filling determination model is on different data sets, and the obtained joint accuracy and the groove accuracy are higher than those of the model in the center of the prior art.
TABLE 1
TABLE 2
On the dialogue field data set 2.1, the accuracy of the model obtained by the joint training of the dialogue state determination model and the state conversion model and the individual training of the dialogue state determination model is known through comparison, and the accuracy of the obtained dialogue state determination model is high through the joint training of the dialogue state determination model and the state conversion model, as shown in table 2.
For the dialogue state determination model in the present application, the dialogue state determination model includes a word vector acquisition sub-model, a feature vector acquisition sub-model, and a dialogue state determination sub-model, and the feature vector acquisition sub-model includes a first attention layer, a feature combination layer, a third attention layer, a second attention layer, and a feature fusion layer. The dialog state determination model is trained by removing different sub-models or layers, as shown in table 3, and the accuracy of the dialog state determination model is the highest as can be seen from the data in table 3.
TABLE 3
| Model (model)
|
Dialogue domain data set 2.1
|
| Dialog state determination model
|
57
|
| Removing feature fusion layer
|
56.76(-0.24)
|
| Removing the second attention layer
|
56.85(-0.15)
|
| Removing the third attention layer
|
55.28(-1.72)
|
| A feature fusion removal layer, a second attention layer, and a third attention layerAttention layer
|
50.28(-6.72) |
In the embodiment of the application, in the process of training the model, the dialogue sentences in the historical dialogue turns are adopted for training. The model is trained by comparing various training modes, and the accuracy of the obtained model is high by training the dialogue sentences in the historical dialogue turns. As shown in table 4.
TABLE 4
| Improvements in or relating to
|
Rate of accuracy
|
| Historical information reasoning improvements
|
64.49%
|
| Current information reasoning improvements
|
34.86%
|
| Other types of information
|
0.65% |
Fig. 6 is a flowchart of a method for training a dialog state determination model according to an embodiment of the present application, where the dialog state determination model includes a word vector obtaining sub-model, a feature vector obtaining sub-model, and a dialog state determination sub-model, and the feature vector obtaining sub-model includes a first attention layer, a feature combination layer, a third attention layer, a second attention layer, and a feature fusion layer. The other feature extraction models are trained models and can be used for obtaining the dimension vector of each preset dimension identification and the word vectors of a plurality of preset keywords corresponding to each preset dimension identification.
In the training process of the model, the joint training is carried out through the joint training of the state transition model and the conversation state determination model.
Obtaining a sub-model through word vectors of the dialogue sentences of a plurality of sample dialogue turns, and obtaining a word vector set of each sample dialogue turn; inputting the word vector set of each sample conversation turn and the dimension vector of the preset dimension identification into a first attention layer to obtain a first feature vector of each sample conversation turn; inputting the first feature vectors of the sample conversation turns into a combined feature layer to obtain combined feature vectors of the sample conversation turns, wherein the first feature vector of each sample conversation turn is used as one dimension of the combined feature vector; inputting the combined feature vector and the position vector of each sample dialogue turn into a third attention layer, and adjusting the first feature vector of each sample dialogue turn by the third attention layer to obtain an updated combined feature vector; inputting the updated combined feature vector and the dimension vector of the preset dimension identifier into a second attention layer, and outputting a second sample fusion feature vector corresponding to the preset dimension identifier by the second attention layer; and inputting the second sample fusion feature vector and the first feature vector of the tth sample dialogue turn into a feature fusion layer to obtain a first sample fusion feature vector corresponding to the preset dimension identification.
Inputting a first sample fusion characteristic vector corresponding to the preset dimension identification and word vectors of a plurality of preset keywords of the preset dimension identification into a conversation state determination submodel to obtain a first loss value sum; and inputting the first sample fusion feature vector corresponding to the preset dimension identification and the first sample fusion feature vector of the previous sample conversation turn into a state conversion model to obtain a second loss value sum, taking the sum of the first loss value sum and the second loss value sum as the total loss value of the state conversion model and the conversation state determination model, and adjusting the state conversion model and the conversation state determination model according to the total loss value.
Fig. 7 is a flowchart of a method for training a dialog state determination model according to an embodiment of the present application, where in a process of training the dialog state determination model, a state transition model and a first bi-directional coding representation model are used, and the first bi-directional coding representation model is a trained model, and is trained jointly by the dialog state determination model and the state transition model, so as to obtain an accurate dialog state determination model. The first bidirectional coding representation model is used for obtaining a dimension vector of each preset dimension identification and a word vector of a preset keyword.
The dialog state determination model comprises a bidirectional coding representation layer, a word-level multi-head attention layer, a feature combination layer, a context coding layer, a sentence-level multi-head attention layer, a gate control layer and a linear transformation layer. In this embodiment of the present application, the word vector obtaining sub-model is a bidirectional coding representation layer, the first attention layer is a word-level multi-head attention layer, the third attention layer is a context coding layer, the second attention layer is a sentence-level multi-head attention layer, the feature fusion layer is a gate control layer, and the dialog state determining sub-model includes a linear transformation layer.
The method comprises the steps of obtaining dialogue sentences of a plurality of sample dialogue turns, processing the dialogue sentences of the plurality of sample dialogue turns through a bidirectional coding presentation layer, a word-level multi-head attention layer, a feature combination layer, a context coding layer, a sentence-level multi-head attention layer and a gate control layer, outputting a first sample fusion feature vector by the gate control layer, and inputting the first sample fusion feature vector to a linear transformation layer and a state transformation model after the first sample fusion feature vector is output by the gate control layer.
The linear transformation layer performs linear transformation on the first sample fusion characteristic vector to obtain a transformed first sample fusion characteristic vector, and obtains a first loss value sum through the transformed first sample fusion characteristic vector and a word vector of a preset keyword.
And the state transition model obtains a second loss value sum according to the first sample fusion characteristic vector and the first sample fusion characteristic vector of the previous sample dialogue turn.
And taking the sum of the first loss value sum and the second loss value sum as the total loss value of the state transition model and the conversation state determination model, and adjusting the state transition model and the conversation state determination model according to the total loss value.
Fig. 8 is a schematic structural diagram of a dialog state determination device according to an embodiment of the present application, where as shown in fig. 8, the dialog state determination device includes:
a dialogue sentence acquisition module 801, configured to acquire dialogue sentences of multiple dialogue rounds, where a dialogue sentence of each dialogue round includes at least one of a question sentence or a reply sentence, and the multiple dialogue rounds include a target dialogue round and at least one historical dialogue round located before the target dialogue round;
a first set obtaining module 802, configured to obtain a word vector set of each conversation turn according to a conversation sentence of each conversation turn, where the word vector set of the conversation turn includes a word vector of each word in the conversation sentence of the conversation turn;
the first fusion processing module 803 is configured to perform weighted fusion processing on the word vector sets of the multiple dialog turns according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of the multiple dialog turns, so as to obtain a first fusion feature vector corresponding to each preset dimension identifier;
a first determining module 804, configured to determine, according to a similarity between a word vector of a preset keyword corresponding to each preset dimension identifier and a corresponding first fusion feature vector, at least one target dimension identifier corresponding to a target dialog turn, and a target keyword corresponding to each target dimension identifier;
the first determining module 804 is further configured to determine at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a dialog state of the target dialog turn.
In one possible implementation, as shown in fig. 9, the first fusion processing module 803 includes:
a first fusion processing unit 8301, configured to perform, for each preset dimension identifier, weighted fusion processing on the word vectors in the word vector set of each conversation turn according to similarities between the dimension vector of the preset dimension identifier and the word vectors in the word vector set of each conversation turn, to obtain a first feature vector of each conversation turn;
the second fusion processing unit 8302 is configured to perform weighted fusion processing on the first feature vectors of the multiple dialog turns according to similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns, so as to obtain a second fusion feature vector corresponding to the preset dimension identifier;
and a third fusion processing unit 8303, configured to perform fusion processing on the second fusion feature vector and the first feature vector of the target dialog turn, to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In a possible implementation manner, the first fusion processing unit 8301 is configured to, for each preset dimension identifier and each conversation turn, respectively determine a first similarity between a dimension vector of the preset dimension identifier and each word vector in a word vector set of the conversation turn; determining a first weight of each word vector according to a first similarity corresponding to each word vector in a word vector set of the conversation turn, wherein the first weight of each word vector is in positive correlation with the corresponding first similarity; and carrying out weighted fusion processing on the plurality of word vectors according to the first weights of the plurality of word vectors in the word vector set to obtain a first feature vector of the conversation turn.
In a possible implementation manner, the second fusion processing unit 8302 is configured to determine second similarities between the dimension vectors of the preset dimension identifiers and the first feature vectors of each dialog turn, respectively; determining a second weight of each conversation turn according to the second similarity corresponding to each conversation turn, wherein the second weight of each conversation turn is in positive correlation with the corresponding second similarity; and according to the second weights of the plurality of conversation turns, performing weighted fusion processing on the first feature vectors of the plurality of conversation turns to obtain second fusion feature vectors corresponding to the preset dimension identification.
In one possible implementation, as shown in fig. 9, the apparatus further includes:
the first adjusting module 805 is configured to perform weighted fusion processing on the first feature vectors of the multiple dialog turns according to third similarities between the first feature vector of any dialog turn and the first feature vectors of the multiple dialog turns, and use the feature vector after the weighted fusion processing as the first feature vector after adjustment of any dialog turn.
In one possible implementation, as shown in fig. 9, the first adjusting module 805 includes:
a weight determining unit 8501, configured to determine a third weight of each of the multiple dialog turns according to a third similarity between the first feature vector of any one dialog turn and the first feature vectors of the multiple dialog turns, where the third weight corresponding to each of the multiple dialog turns is in a positive correlation with the corresponding third similarity;
a fourth fusion processing unit 8502, configured to perform weighted fusion processing on the first feature vectors of the multiple conversation turns according to the third weights corresponding to the multiple conversation turns, and use the feature vectors after the fusion processing as the first feature vectors after any conversation turn adjustment.
In one possible implementation, as shown in fig. 9, the apparatus further includes:
a position vector obtaining module 806, configured to obtain a position vector of each conversation turn, where the position vector of a conversation turn is used to indicate a position of the conversation turn in multiple conversation turns;
the second fusion processing module 807 is configured to perform fusion processing on the first feature vector of each dialog turn and the corresponding position vector, and use the feature vector after the fusion processing as the first feature vector after adjustment of each dialog turn.
In one possible implementation manner, the third fusion processing unit 8303 is configured to determine a fourth weight of the first feature vector of the target dialog turn and a fifth weight of the second fusion feature vector, where a sum of the fourth weight and the fifth weight is 1; and performing weighted fusion processing on the second fusion feature vector and the first feature vector of the target dialogue turn according to the fourth weight and the fifth weight to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In one possible implementation, as shown in fig. 9, the first determining module 804 includes:
a similarity determining unit 8401, configured to determine, for each preset dimension identifier, a fourth similarity between word vectors of a plurality of preset keywords corresponding to the preset dimension identifier and the first fused feature vector;
the keyword selecting unit 8402 is configured to select an alternative keyword from the plurality of preset keywords, where a fourth similarity corresponding to the alternative keyword is greater than fourth similarities corresponding to other preset keywords in the plurality of preset keywords;
the target determining unit 8403 is configured to determine, in response to that the alternative keyword is not the null keyword, the preset dimension identifier as a target dimension identifier, and determine the alternative keyword as the target keyword.
Fig. 10 is a schematic structural diagram of a dialog state determination device according to an embodiment of the present application, where as shown in fig. 10, the dialog state determination device includes:
a dialogue sentence acquisition module 1001 configured to acquire dialogue sentences of a plurality of dialogue rounds, where a dialogue sentence of each dialogue round includes at least one of a question sentence or a reply sentence, and the plurality of dialogue rounds include a target dialogue round and at least one historical dialogue round located before the target dialogue round;
a first set obtaining module 1002, configured to invoke a word vector obtaining sub-model in the dialog state determination model, and obtain a word vector set of each dialog turn according to a dialog statement of each dialog turn;
the first fusion processing module 1003 is configured to invoke a feature vector obtaining sub-model in the dialog state determination model, and perform weighted fusion processing on the word vector sets of the multiple dialog turns according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of the multiple dialog turns to obtain a first fusion feature vector corresponding to each preset dimension identifier;
the first determining module 1004 is configured to invoke a dialog state determining sub-model in the dialog state determining model, and determine at least one target dimension identifier corresponding to a target dialog turn and a target keyword corresponding to each target dimension identifier according to a similarity between a word vector of a preset keyword corresponding to each preset dimension identifier and a corresponding first fusion feature vector; and determining at least one target dimension identifier and a target keyword corresponding to each target dimension identifier as a conversation state of the target conversation turn.
In one possible implementation manner, as shown in fig. 11, the first fusion processing module 1003 includes:
the first fusion processing unit 1031 is configured to call the feature vectors to obtain a first attention layer in the submodel, and perform weighted fusion processing on the word vectors in the word vector set of each dialog turn according to the similarity between the dimension vectors of the preset dimension identifiers and the word vectors in the word vector set of each dialog turn for each preset dimension identifier, so as to obtain a first feature vector of each dialog turn;
the second fusion processing unit 1032 is configured to invoke the feature vector to obtain a second attention layer in the sub-model, and perform weighted fusion processing on the first feature vectors of the multiple dialog turns according to similarity between the dimension vector of the preset dimension identifier and the first feature vectors of the multiple dialog turns to obtain a second fusion feature vector corresponding to the preset dimension identifier;
and a third fusion processing unit 1033, configured to invoke the feature vector to obtain a feature fusion layer in the sub-model, and perform fusion processing on the second fusion feature vector and the first feature vector of the target dialog turn to obtain a first fusion feature vector corresponding to the preset dimension identifier.
In one possible implementation, as shown in fig. 11, the apparatus further includes:
the first adjusting module 1005 is configured to invoke a third attention layer in the feature vector obtaining sub-model, perform weighted fusion processing on the word vector set of the multiple dialog turns according to third similarities between the first feature vector of any one dialog turn and the first feature vectors of the multiple dialog turns, and use the feature vector after the fusion processing as the first feature vector after the dialog turns are adjusted.
In one possible implementation, as shown in fig. 11, the apparatus further includes:
a second adjusting module 1006, configured to invoke a third attention layer in the feature vector obtaining sub-model, obtain a position vector of each dialog turn, where the position vector of a dialog turn is used to represent a position of the dialog turn in multiple dialog turns; and carrying out fusion processing on the first feature vector of each conversation turn and the corresponding position vector, and taking the feature vector after the fusion processing as the adjusted first feature vector of each conversation turn.
In one possible implementation, as shown in fig. 11, the apparatus further includes:
the dialogue sentence acquisition module 1001 is further configured to acquire sample dialogue sentences of multiple sample dialogue rounds and a sample dialogue state corresponding to each sample dialogue round, where the sample dialogue sentences of each sample dialogue round include at least one of a sample user question sentence or a sample reply sentence, and the multiple sample dialogue rounds include a target sample dialogue round and at least one historical sample dialogue round located before the target sample dialogue round;
the first set obtaining module 1002 is further configured to invoke a word vector obtaining sub-model, and obtain, according to a sample dialogue statement of each sample dialogue turn, a word vector set of each sample dialogue turn, where the word vector set of the sample dialogue turn includes a word vector of each word in the sample dialogue statement of the sample dialogue turn;
the first fusion processing module 1003 is further configured to invoke a feature vector obtaining sub-model, and perform weighted fusion processing on the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn according to the similarity between the dimension vector of each preset dimension identifier and each word vector in the word vector set of the target sample conversation turn and the corresponding historical sample conversation turn, so as to obtain a first sample fusion feature vector corresponding to each preset dimension identifier;
the first determining module 1004 is further configured to invoke a dialog state determining sub-model, and determine at least one predicted dimension identifier corresponding to a target sample dialog turn and a predicted keyword corresponding to each predicted dimension identifier according to the word vectors of the multiple preset keywords corresponding to each preset dimension identifier and the corresponding first sample fusion feature vector; determining at least one prediction dimension identifier and a prediction keyword corresponding to each prediction dimension identifier as a prediction conversation state of a target sample conversation turn;
the model adjusting module 1007 is configured to adjust the word vector obtaining sub-model, the feature vector obtaining sub-model, and the dialogue state determining sub-model according to the predicted dialogue state of the target sample dialogue turn and the corresponding sample dialogue state.
In one possible implementation, as shown in fig. 11, the apparatus further includes:
a second determining module 1008, configured to use any sample conversation turn in the multiple sample conversation turns as a target sample conversation turn, and use a sample conversation turn before any sample conversation turn as a historical sample conversation turn of the target sample conversation turn.
In one possible implementation, as shown in fig. 11, an apparatus includes:
a change probability obtaining module 1009, configured to obtain a plurality of sample change probabilities of each preset dimension identifier, where the sample change probabilities represent differences between sample conversation states corresponding to every two adjacent sample conversation turns in the plurality of sample conversation turns;
the feature vector processing module 1010 is configured to invoke a state transition model for each preset dimension identifier, and respectively process a first fusion feature vector of every two adjacent sample conversation turns corresponding to the preset dimension identifier to obtain a plurality of predicted change probabilities corresponding to the preset dimension identifier, where the plurality of sample change probabilities corresponding to the same preset dimension identifier correspond to the plurality of predicted change probabilities one to one;
model adjustment module 1007, comprising:
the model adjusting unit 1071 is configured to adjust the word vector obtaining sub-model, the feature vector obtaining sub-model, the dialogue state determining sub-model, and the call state transition model according to the obtained multiple predicted change probabilities of each preset dimension identifier, multiple sample change probabilities of each preset dimension identifier, the predicted dialogue state of each sample dialogue turn, and the corresponding sample dialogue state.
Fig. 12 shows a block diagram of an electronic device 1200 according to an exemplary embodiment of the present application. The electronic device 1200 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 player (Moving picture Experts Group Audio Layer III, motion picture Experts compression standard Audio Layer 3), an MP4 player (Moving picture Experts Group Audio Layer IV, motion picture Experts compression standard Audio Layer 4), a notebook computer or a desktop computer. The electronic device 1200 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.
In general, the electronic device 1200 includes: a processor 1201 and a memory 1202.
The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) for rendering and drawing content required to be displayed by the display screen. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement a dialog state determination method provided by method embodiments herein.
In some embodiments, the electronic device 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, display 1205, camera assembly 1206, audio circuitry 1207, positioning assembly 1208, and power supply 1209.
The peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1204 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to acquire touch signals on or over the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this point, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one, disposed on the front panel of the electronic device 1200; in other embodiments, the display panels 1205 can be at least two, respectively disposed on different surfaces of the electronic device 1200 or in a folded design; in other embodiments, the display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 1200. Even further, the display screen 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display panel 1205 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.
Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.
The positioning component 1208 is used to locate a current geographic Location of the electronic device 1200 to implement navigation or LBS (Location Based Service). The Positioning component 1208 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
The power supply 1209 is used to supply power to various components in the electronic device 1200. The power source 1209 may be alternating current, direct current, disposable or rechargeable. When the power source 1209 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the electronic device 1200 also includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: acceleration sensor 1211, gyro sensor 1212, pressure sensor 1213, fingerprint sensor 1214, optical sensor 1215, and proximity sensor 1216.
The acceleration sensor 1211 may detect magnitudes of accelerations on three coordinate axes of a coordinate system established with the electronic apparatus 1200. For example, the acceleration sensor 1211 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1201 may control the display screen 1205 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 1212 may detect a body direction and a rotation angle of the electronic device 1200, and the gyro sensor 1212 may collect a 3D motion of the user on the electronic device 1200 in cooperation with the acceleration sensor 1211. The processor 1201 can implement the following functions according to the data collected by the gyro sensor 1212: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensors 1213 may be disposed on the side bezel of the electronic device 1200 and/or underlying layers of the display 1205. When the pressure sensor 1213 is disposed on a side frame of the electronic device 1200, a user's holding signal to the electronic device 1200 can be detected, and the processor 1201 performs left-right hand recognition or shortcut operation according to the holding signal acquired by the pressure sensor 1213. When the pressure sensor 1213 is disposed at a lower layer of the display screen 1205, the processor 1201 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1205. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1214 is used for collecting a fingerprint of the user, and the processor 1201 identifies the user according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 1201 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1214 may be disposed on the front, back, or side of the electronic device 1200. When a physical button or vendor Logo is provided on the electronic device 1200, the fingerprint sensor 1214 may be integrated with the physical button or vendor Logo.
The optical sensor 1215 is used to collect the ambient light intensity. In one embodiment, the processor 1201 may control the display brightness of the display 1205 according to the ambient light intensity collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display luminance of the display panel 1205 is increased; when the ambient light intensity is low, the display brightness of the display panel 1205 is turned down. In another embodiment, processor 1201 may also dynamically adjust the camera head 1206 shooting parameters based on the ambient light intensity collected by optical sensor 1215.
The proximity sensor 1216, also called a distance sensor, is typically disposed on the front panel of the electronic device 1200. The proximity sensor 1216 is used to collect the distance between the user and the front of the electronic device 1200. In one embodiment, the processor 1201 controls the display screen 1205 to switch from the bright screen state to the dark screen state when the proximity sensor 1216 detects that the distance between the user and the front of the electronic device 1200 is gradually decreased; when the proximity sensor 1216 detects that the distance between the user and the front surface of the electronic device 1200 gradually becomes larger, the processor 1201 controls the display 1205 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 12 is not limiting of electronic device 1200 and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.
Fig. 13 is a schematic structural diagram of a server 1300 according to an embodiment of the present application, where the server 1300 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1301 and one or more memories 1302, where the memory 1302 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 1301 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.
The server 1300 may be used to perform the above-described dialog state determination method.
The embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor, so as to implement the dialog state determination method according to the foregoing embodiment.
The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is loaded and executed by a processor to implement the dialog state determination method according to the foregoing embodiment.
The embodiment of the present application further provides a computer program, where at least one instruction is stored in the computer program, and the at least one instruction is loaded and executed by a processor, so as to implement the dialog state determination method according to the foregoing embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.