CN111950733A

CN111950733A - Information flow sorting method, device and computer storage medium

Info

Publication number: CN111950733A
Application number: CN201910407187.XA
Authority: CN
Inventors: 王岳
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2020-11-17
Anticipated expiration: 2039-05-15
Also published as: CN111950733B

Abstract

The embodiments of the present invention disclose a method, a device and a computer storage medium for sorting information flow. The method includes: obtaining an information flow recommendation list according to user characteristics and current state; obtaining a score for each information flow material in the information flow recommendation list according to the information flow recommendation list and information flow characteristics; The information flow in the information flow recommendation list described above is sorted. According to the embodiments of the present invention, long-term benefits can be considered in the process of information flow sorting, and historical accumulation can be maximized, thereby improving user experience.

Description

Information flow sorting method, device and computer storage medium

技术领域technical field

本发明涉及机器学习技术领域，更具体地，涉及一种信息流的排序方法、装置及计算机存储介质。The present invention relates to the technical field of machine learning, and more particularly, to an information flow sorting method, device and computer storage medium.

背景技术Background technique

在信息流短视频场景中，用户每次刷新信息流短视频列表，都可以看到很多短视频按顺序排列。通常，需要在短视频列表中越靠前的位置显示用户越有可能感兴趣的短视频，以吸引用户点击。如果不能将用户最感兴趣的短视频排到最前面，用户需要下拉短视频列表来寻找最感兴趣的短视频，这样会增加用户成本。可见，信息流的排序非常重要。In the information flow short video scene, every time the user refreshes the information flow short video list, he can see many short videos arranged in order. Generally, short videos that are more likely to be of interest to users need to be displayed at the higher position in the short video list, so as to attract users to click. If the short videos that the user is most interested in cannot be ranked first, the user needs to drop down the short video list to find the most interesting short videos, which will increase the user cost. It can be seen that the ordering of information flow is very important.

现有技术中，在信息流的排序阶段，通常使用点击率预估模型实现对信息流的排序。但是点击率预估模型中主要考虑的是在当前场景下的最大的点击可能性，也就是说，使用点击率预估模型对信息流排序，只能保持短期最大化，给出的信息流排序并不一定是用户最感兴趣的，用户体验较差。In the prior art, in the ordering stage of the information flow, the click-through rate estimation model is usually used to realize the ordering of the information flow. However, the CTR estimation model mainly considers the maximum click possibility in the current scenario. That is to say, using the CTR estimation model to sort the information flow can only maintain short-term maximization, and the given information flow ordering It is not necessarily what users are most interested in, and the user experience is poor.

因此，发明人认为，有必要针对上述现有技术中存在的至少一个问题进行改进。Therefore, the inventor believes that it is necessary to improve at least one of the problems existing in the above-mentioned prior art.

发明内容SUMMARY OF THE INVENTION

本发明实施例的一个目的是提供一种信息流的排序的新技术方案。An object of the embodiments of the present invention is to provide a new technical solution for sorting information flow.

根据本发明实施例的第一方面，提供了一种信息流的排序方法，所述方法包括：According to a first aspect of the embodiments of the present invention, there is provided a method for sorting information flow, the method comprising:

根据用户特征及当前时刻状态，得到信息流推荐列表；According to the user characteristics and the current state of the moment, get the information flow recommendation list;

根据信息流推荐列表和信息流特征，得到信息流推荐列表中每一信息流物料的评分；According to the information flow recommendation list and the information flow characteristics, get the score of each information flow material in the information flow recommendation list;

根据所述评分对所述信息流推荐列表中的信息流进行排序。Sort the information streams in the information stream recommendation list according to the scores.

可选地，根据用户特征及当前时刻状态，得到信息流推荐列表，包括：Optionally, according to the user characteristics and the current state of the moment, an information flow recommendation list is obtained, including:

根据所述当前时刻状态、当前时刻的回报值、下一时刻状态、用户动作以及第一映射函数，计算得到评估值；Calculate the evaluation value according to the current moment state, the reward value at the current moment, the next moment state, the user action and the first mapping function;

根据所述用户特征、所述当前时刻状态、所述评估值以及第二映射函数，计算得到所述信息流推荐列表。According to the user feature, the current state, the evaluation value and the second mapping function, the information flow recommendation list is obtained by calculation.

可选地，通过以下公式计算得到所述当前时刻的回报值reward：Optionally, the reward value reward at the current moment is calculated by the following formula:

其中，click+β*read_time为所述用户动作，β为调权因子，

为位置调权因子；N为信息流数量。Among them, click+β*read_time is the user action, β is the adjustment factor,

is the location adjustment factor; N is the number of information flows.

可选地，所述第一映射函数的损失函数critic_loss的表达式为：Optionally, the expression of the loss function critic_loss of the first mapping function is:

critic_loss＝reward+gamma*v_t+1-v_t；critic_loss=reward+gamma*v _t+1 -v _t ;

其中，reward为所述当前时刻的回报值；gamma为平滑因子；v_t为所述当前时刻状态；v_t+1为所述下一时刻状态。Wherein, reward is the reward value at the current moment; gamma is a smoothing factor; v _t is the current moment state; v _t+1 is the next moment state.

可选地，所述第二映射函数的损失函数actor_loss的表达式为：Optionally, the expression of the loss function actor_loss of the second mapping function is:

actor_loss＝reward_gain*td_error；actor_loss=reward_gain*td_error;

其中，td_error为所述评估值；reward_gain＝reward-origin_reward为当前时刻的回报值增益，origin_reward为原始回报值，reward为所述当前时刻的回报值。Wherein, td_error is the evaluation value; reward_gain=reward-origin_reward is the reward value gain at the current moment, origin_reward is the original reward value, and reward is the reward value at the current moment.

可选地，所述方法还包括：Optionally, the method further includes:

获取日志信息；Get log information;

根据所述日志信息中的所述当前时刻状态、所述当前时刻的回报值、所述下一时刻状态以及所述用户动作，更新所述第一映射函数；updating the first mapping function according to the current state of the log information, the reward value of the current time, the state of the next time, and the user action;

根据所述日志信息中的所述用户特征、所述当前时刻状态以及所述评估值，更新所述第二映射函数。The second mapping function is updated according to the user characteristics, the current state and the evaluation value in the log information.

根据本发明实施例的第二方面，提供了一种信息流的排序装置，所述装置包括：According to a second aspect of the embodiments of the present invention, an apparatus for sorting information flow is provided, and the apparatus includes:

获取模块，用于根据用户特征及当前时刻状态，得到信息流推荐列表；The acquisition module is used to obtain the information flow recommendation list according to the user characteristics and the current state of the moment;

评分模块，用于根据信息流推荐列表和信息流特征，得到信息流推荐列表中每一信息流物料的评分；The scoring module is used to obtain the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;

排序模块，用于根据所述评分对所述信息流推荐列表中的信息流进行排序。A sorting module, configured to sort the information flow in the information flow recommendation list according to the score.

可选地，所述获取模块具体用于：Optionally, the obtaining module is specifically used for:

根据本发明实施例的第三方面，提供了一种信息流的排序装置，所述装置包括：存储器和处理器，所述存储器用于存储指令；所述指令用于控制所述处理器进行操作，以执行如本发明实施例的第一方面中任意一项所述的信息流的排序方法。According to a third aspect of the embodiments of the present invention, an apparatus for sorting information flow is provided, the apparatus includes: a memory and a processor, where the memory is used to store instructions; the instructions are used to control the processor to operate , so as to execute the method for sorting information flow according to any one of the first aspect of the embodiments of the present invention.

根据本发明实施例的第四方面，提供了一种计算机存储介质，其上存储有计算机程序，所述计算机程序在被处理器执行时实现如本发明实施例的第一方面中任意一项所述的信息流的排序方法。According to a fourth aspect of the embodiments of the present invention, there is provided a computer storage medium on which a computer program is stored, and the computer program, when executed by a processor, implements any one of the first aspects of the embodiments of the present invention. The sorting method of the information flow described above.

本发明的一个有益效果在于，��以在信息流排序的过程中考虑长期收益，保持历史积累最大化，从而提升用户体验。One beneficial effect of the present invention is that long-term benefits can be considered in the process of information flow sorting, and historical accumulation can be maximized, thereby improving user experience.

通过以下参照附图对本发明的示例性实施例的详细描述，本发明的其它特征及其优点将会变得清楚。Other features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

附图说明Description of drawings

被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例，并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

图1是显示可用于实现本发明实施例的客户端1000的硬件配置的框图。FIG. 1 is a block diagram showing a hardware configuration of a client 1000 that can be used to implement an embodiment of the present invention.

图2是根据本发明实施例的信息流的排序方法的流程示意图。FIG. 2 is a schematic flowchart of a method for sorting information flows according to an embodiment of the present invention.

图3是根据本发明实施例的构建AC模型的示意图。FIG. 3 is a schematic diagram of constructing an AC model according to an embodiment of the present invention.

图4是根据本发明实施例的信息流的位置调整的示意图。FIG. 4 is a schematic diagram of position adjustment of an information flow according to an embodiment of the present invention.

图5是根据本发明实施例的AC模型更新的示意图。FIG. 5 is a schematic diagram of AC model update according to an embodiment of the present invention.

图6是根据本发明的信息流的排序装置600的结构示意图。FIG. 6 is a schematic structural diagram of an information flow sorting apparatus 600 according to the present invention.

图7是根据另一实施例的信息流的排序装置700的硬件结构示意图。FIG. 7 is a schematic diagram of a hardware structure of an information flow sorting apparatus 700 according to another embodiment.

图8是未使用本发明实施例的方法得到的信息流排序的示意图。FIG. 8 is a schematic diagram of information flow ordering obtained without using the method of the embodiment of the present invention.

图9是根据本发明实施例的方法得到的信息流排序的示意图。FIG. 9 is a schematic diagram of information flow ordering obtained by a method according to an embodiment of the present invention.

具体实施方式Detailed ways

现在将参照附图来详细描述本发明的各种示例性实施例。应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the invention unless specifically stated otherwise.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.

在这里示出和讨论的所有例子中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as illustrative only and not limiting. Accordingly, other instances of the exemplary embodiment may have different values.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.

下面，参照附图描述根据本发明实施例的各个实施例和例子。Hereinafter, various embodiments and examples according to embodiments of the present invention will be described with reference to the accompanying drawings.

<硬件配置><Hardware configuration>

根据图1所示，本实施例的客户端1000可以是便携式电脑、台式计算机、手机、平板电脑等。As shown in FIG. 1 , the client 1000 in this embodiment may be a portable computer, a desktop computer, a mobile phone, a tablet computer, or the like.

如图1所示，客户端1000可以包括处理器1010、存储器1020、接口装置1030、通信装置1040、显示装置1050、输入装置1060、扬声器1070、麦克风1080，等等。其中，处理器1010可以是中央处理器CPU、微处理器MCU等。存储器1020例如包括ROM(只读存储器)��RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1030例如包括USB接口、耳机接口等。通信装置1040例如能够进行有线或无线通信。显示装置1050例如是液晶显示屏、触摸显示屏等。输入装置1060例如可以包括触摸屏、键盘等。用户可以通过扬声器1070和麦克风1080输入/输出语音信息。As shown in FIG. 1, the client 1000 may include a processor 1010, a memory 1020, an interface device 1030, a communication device 1040, a display device 1050, an input device 1060, a speaker 1070, a microphone 1080, and the like. The processor 1010 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1020 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1030 includes, for example, a USB interface, an earphone interface, and the like. The communication device 1040 is capable of wired or wireless communication, for example. The display device 1050 is, for example, a liquid crystal display, a touch display, or the like. The input device 1060 may include, for example, a touch screen, a keyboard, and the like. The user can input/output voice information through the speaker 1070 and the microphone 1080 .

该实施例中，客户端1000的存储器1020用于存储指令，该指令用于控制处理器1010进行操作以至少执行根据本发明任意实施例的信息流的排序方法。本领域技术人员应当理解，尽管在图1中示出了客户端1000的多个装置，但是，本发明可以仅涉及其中的部分装置，例如，客户端1000只涉及存储器1020、处理器1010以及显示装置1050。技术人员可以根据本发明所公开方案设计指令。指令如何控制处理器进行操作，这是本领域公知，故在此不再详细描述。In this embodiment, the memory 1020 of the client 1000 is used to store instructions, and the instructions are used to control the processor 1010 to operate to at least execute the method for sorting information flow according to any embodiment of the present invention. Those skilled in the art should understand that although FIG. 1 shows multiple devices of the client 1000, the present invention may only involve some of the devices. For example, the client 1000 only involves the memory 1020, the processor 1010 and the display device 1050. A skilled person can design instructions according to the solutions disclosed in the present invention. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.

<方法><method>

图2是根据本发明实施例的信息流的排序方法的流程示意图。该方法可以由客户端1000实施。FIG. 2 is a schematic flowchart of a method for sorting information flows according to an embodiment of the present invention. The method may be implemented by the client 1000 .

根据图2所示，本实施例的信息流的排序方法可以包括如下步骤2100～步骤2300：As shown in FIG. 2 , the method for sorting information flows in this embodiment may include the following steps 2100 to 2300:

步骤2100，根据用户特征及当前时刻状态，��到信息流推荐列表。Step 2100: Obtain an information flow recommendation list according to user characteristics and current state.

其中，用户特征是指用户的偏好特征信息，例如，用户特征可以是用户兴趣信息、用户偏好信息、用户历史浏览记录等。当前时刻状态是指用户当前时刻的状态，例如浏览信息流列表、观看某条信息流等。The user feature refers to user preference feature information, for example, the user feature may be user interest information, user preference information, user history browsing records, and the like. The current moment status refers to the status of the user at the current moment, such as browsing the information stream list, watching a certain information stream, and so on.

信息流推荐列表中包括多条信息流，这些信息流是根据用户特征及当前时刻状态计算得到的。The information flow recommendation list includes multiple information flows, which are calculated according to user characteristics and current state.

具体的，客户端1000在计算得到信息流推荐列表时，可以采用强化学习算法中的AC模型作为基础模型，通过在基础模型上进行创新构建本实施例中计算得到所述信息流推荐列表的模型。其中，AC模型中的A为actor神经网络，也就是图3中的Policy，AC模型中的C为critic神经网络，也就是图3中的Value。Specifically, when the client 1000 calculates and obtains the information flow recommendation list, the AC model in the reinforcement learning algorithm may be used as the basic model, and the model for calculating the information flow recommendation list in this embodiment is constructed by innovating on the basic model. . Among them, A in the AC model is the actor neural network, which is the Policy in Figure 3, and C in the AC model is the critic neural network, which is the Value in Figure 3.

如图3所示，客户端1000具体可以根据所述当前时刻状态S_t、当前时刻的回报值r_t、下一时刻状态S_t+1、用户动作a_t以及第一映射函数Value，计算得到评估值t_d。根据所述用户特征、所述当前时刻状态S_t、所述评估值t_d以及第二映射函数Policy，计算得到所述信息流推荐列表Environment。As shown in FIG. 3 , the client 1000 can specifically calculate and obtain according to the current state S _t , the reward value r _t at the current time, the next state S _t ₊₁ , the user action at and the first mapping function Value Evaluation value t _d . According to the user characteristics, the current state S _t , the evaluation value t _d and the second mapping function Policy, the information flow recommendation list Environment is obtained by calculation.

其中，第二映射函数Policy主要用于决策行为，第一映射函数Value用于评估决策行为的好坏，再反馈给第二映射函数Policy来进行修正。用户动作a_t的不同会导致信息流推荐列表中的信息流的排序不同，用户在浏览信息流后，会采用点击操作，在此我们定义一个回报值，例如，定义用户点击的越多回报值越高等。用户点击或不点击信息流之后，用户的当前时刻状态S_t变��下一时刻状态S_t+1，第一映射函数Value会根据当前时刻状态S_t、当前时刻的回报值r_t、下一时刻状态S_t+1、用户动作a_t计算出一个评估值t_d，该评估值用于评估当前时刻的回��值r_t与期望的回报值之间的符合度。在实际应用中，第一映射函数Value的目标就是最小化该评估值t_d，同时将该评估值t_d传给第二映射函数Policy，以使第二映射函数Policy根据该评估值t_d进行修正。Wherein, the second mapping function Policy is mainly used for decision-making behavior, and the first mapping function Value is used for evaluating the quality of the decision-making behavior, and is then fed back to the second mapping function Policy for correction. The difference of user actions at _t will lead to different sorting of information flow in the information flow recommendation list. After browsing the information flow, the user will use the click operation. Here we define a reward value, for example, the more the user clicks the reward value. higher and so on. After the user clicks or does not click on the information flow, the user's current state S _t becomes the next state S _t+1 , and the first mapping function Value will be based on the current state S _t , the current return value _rt , and the next state S t . An evaluation value t _d is calculated from the moment state S _t+1 and the user action at _t , and the evaluation value is used to evaluate the degree of conformity between the reward value _rt at the current moment and the expected reward value. In practical applications, the goal of the first mapping function Value is to minimize the evaluation value t _d , and at the same time pass the evaluation value t _d to the second mapping function Policy, so that the second mapping function Policy performs the evaluation according to the evaluation value t _d . Correction.

本实施例中，定义用户在访问信息流服务时，每次刷新会推荐8条信息流，此处的回报值reward是考虑用户是否点击信息流以及观看时长，并且有位置调权因子，也就是点击越靠前位置的信息流，位置调权因子越大，这样，将8条信息流中每个位置上的回报值加起来就是总体收益，此处，将该总体收益定义为原始回报值origin_reward。在此，客户端1000需要通过计算，将有点击且观看时长久的信息流调整到更靠前的位置，例如图4中所示，将R0.3的信息流的位置从原先的第三位置调整至第一位置，将R0.2的信息流的位置从原先的第一位置调整至第二位置，将R0的信息流的位置从原先的第二位置调整至第三位置。In this embodiment, it is defined that when a user accesses the information stream service, 8 information streams will be recommended for each refresh. The reward value here is to consider whether the user clicks on the information stream and the viewing time, and there is a position adjustment factor, that is, The higher the position of the information stream is, the larger the position adjustment factor is. In this way, the sum of the return value of each position in the 8 information streams is the overall income. Here, the overall income is defined as the original return value origin_reward . Here, the client 1000 needs to adjust the information flow with clicks and long viewing time to a more advanced position through calculation. For example, as shown in FIG. 4, the position of the information flow of R0.3 is changed from the original third position. Adjust to the first position, adjust the position of the information flow of R0.2 from the original first position to the second position, and adjust the position of the information flow of R0 from the original second position to the third position.

具体的，第一映射函数Value采用的损失函数critic_loss的表达式为：Specifically, the expression of the loss function critic_loss adopted by the first mapping function Value is:

critic_loss＝reward+gamma*v_t+1-v_t；其中，reward为所述当前时刻的回报值；gamma为平滑因子，取值例如可以是0.8；v_t为所述当前时刻状态；v_t+1为所述下一时刻状态。第二映射函数Policy采用的损失函数actor_loss的表达式为：actor_loss＝reward_gain*td_error；其中，td_error为所述评估值；reward_gain＝reward-origin_reward为当前时刻的回报值增益，origin_reward为原始回报值，reward为所述当前时刻的回报值。critic_loss=reward+gamma*v _t+1 -v _t ; wherein, reward is the reward value at the current moment; gamma is a smoothing factor, and the value can be, for example, 0.8; v _t is the state at the current moment; v _{t+ 1} is the state at the next moment. The expression of the loss function actor_loss adopted by the second mapping function Policy is: actor_loss=reward_gain*td_error; wherein, td_error is the evaluation value; reward_gain=reward-origin_reward is the return value gain at the current moment, origin_reward is the original return value, reward is the reward value at the current moment.

其中，所述当前时刻的回报值reward可以通过以下公式计算得到：Wherein, the reward value at the current moment can be calculated by the following formula:

其中，click+β*read_time为所述用户动作，用户动作包括用户点击信息流及观看时长；β为调权因子，取值例如可以是0.1；

为位置调权因子，

的取值例如可以是0.9；N为信息流数量，pos为信息流在所述信息流列表中的位置。例如，若信息流在所述信息流推荐列表中的位置为1，则对应的位置调权因子为0.9；若信息流在所述信息流推荐列表中的位置为2，则对应的位置调权因子为0.9²＝0.81。

Among them, click+β*read_time is the user action, and the user action includes the user clicks on the information flow and the viewing time; β is the adjustment factor, and the value can be, for example, 0.1;

is the position adjustment factor,

The value of , for example, may be 0.9; N is the number of information flows, and pos is the position of the information flow in the information flow list. For example, if the position of the information flow in the information flow recommendation list is 1, the corresponding position adjustment factor is 0.9; if the position of the information flow in the information flow recommendation list is 2, the corresponding position adjustment weight The factor is 0.9 ² =0.81.

步骤2200，根据信息流推荐列表和信息流特征，得到信息流推荐列表中每一信息流物料的评分。Step 2200: Obtain the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow feature.

其中，信息流特征是信息流中的特征信息，例如，该信息流的主题信息、内容信息等。本步骤中，客户端1000将信息流推荐列表与信息流特征进行融��，为信息流推荐列表中的每一信息流物料进行评分。The information flow feature is feature information in the information flow, for example, topic information, content information, and the like of the information flow. In this step, the client 1000 fuses the information flow recommendation list with the information flow features, and scores each information flow material in the information flow recommendation list.

步骤2300，根据所述评分对所述信息流推荐列表中的信息流进行排序。Step 2300: Sort the information flows in the information flow recommendation list according to the score.

实际应用中，可以根据评分的高低对所述信息流推荐列表中的信息流进行排序，并在客户端1000的显示屏幕上进行显示。In practical applications, the information streams in the information stream recommendation list may be sorted according to the scores, and displayed on the display screen of the client 1000 .

例如，在未使用本实施例的方法对信息流排序时，显示在客户端1000的显示屏幕上的信息流的排序如图8所示，将信息流推荐列表中的信息流1、信息流2、信息流3、信息流4和信息流5依次顺序进行排列显示。而根据本实施例的方法，得到的信息流推荐列表后，根据信息流推荐列表和信息流特征对信息流推荐列表中的每一信息流物料进行评分，如图9所示，信息流1的评分为0.1，信息流2的评分为0.3，信息流3的评分为0.8，信息流4的评分为0.2，信息流5的评分为0.6，客户端1000根据评分对信息流推荐列表中的信息流进行排序，在客户端1000的显示屏幕上所显示的信息流的顺序是信息流3、信息流5、信息流2、信息流4、信息流1。可见，使用本实施例的方法，得到的信息流排序是用户最感兴趣的，从而可以吸引用户点击，提升用户体验。For example, when the method of this embodiment is not used to sort the information flow, the sorting of the information flow displayed on the display screen of the client 1000 is as shown in FIG. 8 , the information flow 1 and the information flow 2 in the information flow recommendation list are , information flow 3, information flow 4 and information flow 5 are arranged and displayed in sequence. According to the method of this embodiment, after the information flow recommendation list is obtained, each information flow material in the information flow recommendation list is scored according to the information flow recommendation list and the information flow characteristics. As shown in FIG. 9 , the information flow 1 The score is 0.1, the score of information flow 2 is 0.3, the score of information flow 3 is 0.8, the score of information flow 4 is 0.2, the score of information flow 5 is 0.6, and the client 1000 recommends the information flow in the information flow list according to the score. For sorting, the order of the information flow displayed on the display screen of the client terminal 1000 is information flow 3, information flow 5, information flow 2, information flow 4, and information flow 1. It can be seen that, using the method of this embodiment, the obtained information flow ranking is of the most interest to the user, so that the user can be attracted to click and the user experience can be improved.

进一步的，本实施例的信息流的排序方法还包括：获取日志信息；根据所述日志信息中的所述当前时刻状态、所述当前时刻的回报值、所述下一时刻状态以及所述用户动作，更新所述第一映射函数；根据所述日志信息中的所述用户特征、所述当前时刻状态以及所述评估值，更新所述第二映射函数。Further, the method for sorting information flow in this embodiment further includes: acquiring log information; according to the current state in the log information, the reward value at the current time, the state at the next time, and the user Action, update the first mapping function; update the second mapping function according to the user characteristics, the current state and the evaluation value in the log information.

如图5所示，客户端1000获取日志信息Log，对日志信息Log经过抽取、转换、加载(Extract，transform，load，ETL)处理后，更新所述第一映射函数和第二映射函数，以更新本实施例的AC模型(AC Train)。在完成AC模型更新后，提供给在线部分访问。As shown in FIG. 5 , the client 1000 obtains the log information Log, and after extracting, transforming, and loading (Extract, transform, load, ETL) processing the log information Log, the first mapping function and the second mapping function are updated to The AC model (AC Train) of this embodiment is updated. Provides access to the online section after completing the AC model update.

根据本实施例的信息流的排序方法，客户端根据用户特征及当前时刻状态，得到信息流推荐列表；根据信息流推荐列表和信息流特征，得到信息流推荐列表中每一信息流物料的评分；根据所述评分对所述信息流推荐列表中的信息流进行排序。根据本发明实施例，可以在信息流排序的过程中考虑长期收益，保持历史积累最大化，从而提升用户体验。According to the information flow sorting method of this embodiment, the client obtains the information flow recommendation list according to the user characteristics and the current state of the time; according to the information flow recommendation list and the information flow characteristics, obtains the score of each information flow material in the information flow recommendation list ; Sort the information flow in the information flow recommendation list according to the score. According to the embodiments of the present invention, long-term benefits can be considered in the process of information flow sorting, and historical accumulation can be maximized, thereby improving user experience.

<装置><device>

根据图6所示，该信息流的排序装置600可以包括获取模块610、评分模块620和排序模块630。As shown in FIG. 6 , the information flow sorting apparatus 600 may include an obtaining module 610 , a scoring module 620 and a sorting module 630 .

其中，获取模块610用于根据用户特征及当前时刻状态，得到信息流推荐列表。Wherein, the obtaining module 610 is configured to obtain the information flow recommendation list according to the user characteristics and the current state of the time.

评分模块620用于根据信息流推荐列表和信息流特征，得到信息流推荐列表中每一信息流物料的评分。The scoring module 620 is configured to obtain a score for each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow feature.

排序模块630用于根据所述评分对所述信息流推荐列表中的信息流进行排序。The sorting module 630 is configured to sort the information flow in the information flow recommendation list according to the score.

具体的，该获取模块610具体可以用于：根据所述当前时刻状态、当前时刻的回报值、下一时刻状态、用户动作以及第一映射函数，计算得到评估值；根据所述用户特征、所述当前时刻状态、所述评估值以及第二映射函数，计算得到所述信息流推荐列表。Specifically, the obtaining module 610 can be specifically configured to: calculate and obtain the evaluation value according to the current moment state, the reward value at the current moment, the next moment state, the user action and the first mapping function; The current moment state, the evaluation value and the second mapping function are calculated to obtain the information flow recommendation list.

其中，所述第一映射函数的损失函数critic_loss的表达式为：Wherein, the expression of the loss function critic_loss of the first mapping function is:

critic_loss＝reward+gamma*v_t+1-v_t；其中，reward为所述当前时刻的回报值；gamma为平滑因子；v_t为所述当前时刻状态；v_t+1为所述下一时刻状态。critic_loss=reward+gamma*v _t+1 -v _t ; wherein, reward is the reward value at the current moment; gamma is the smoothing factor; v _t is the state at the current moment; v _t+1 is the next moment state.

所述第二映射函数的损失函数actor_loss的表达式为：The expression of the loss function actor_loss of the second mapping function is:

actor_loss＝reward_gain*td_error；其中，td_error为所述评估值；reward_gain＝reward-origin_reward为当前时刻的回报值增益，origin_reward为原始回报值，reward为所述当前时刻的回报值。actor_loss=reward_gain*td_error; wherein, td_error is the evaluation value; reward_gain=reward-origin_reward is the reward value gain at the current moment, origin_reward is the original reward value, and reward is the reward value at the current moment.

实际应用中，可以通过以下公式计算得到所述当前时刻的回报值reward：

其中，click+β*read_time为所述用户动作，β为调权因子，

为位置调权因子；N为信息流数量。In practical applications, the reward value at the current moment can be calculated by the following formula:

Among them, click+β*read_time is the user action, β is the adjustment factor,

is the location adjustment factor; N is the number of information flows.

进一步的，所述获取模块610还可以用于：获取日志信息；根据所述日志信息中的所述当前时刻状态、所述当前时刻的回报值、所述下一时刻状态以及所述用户动作，更新所述第一映射函数；根据所述日志信息中的所述用户特征、所述当前时刻状态以及所述评估值，更新所述第二映射函数。Further, the obtaining module 610 can also be used to: obtain log information; according to the current moment state, the reward value at the current moment, the next moment status and the user action in the log information, The first mapping function is updated; the second mapping function is updated according to the user characteristics, the current moment state and the evaluation value in the log information.

根据图7所示，本实施例的信息流的排序装置700可以包括存储器710和处理器720。As shown in FIG. 7 , the apparatus 700 for sorting information flow in this embodiment may include a memory 710 and a processor 720 .

存储器710用于存储指令，该指令用于控制处理器720进行操作以执行本发明任意实施例的信息流的排序方法。The memory 710 is used to store instructions for controlling the operation of the processor 720 to perform the method for sorting information flow according to any embodiment of the present invention.

技术人员可以根据本发明所公开方案设计指令。指令如何控制处理器进行操作，这是本领域公知，故在此不再详细描述。A skilled person can design instructions according to the solutions disclosed in the present invention. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.

本实施例的信息流的排序装置，可用于执行上述方法实施例的技术方案，其实现原理和技术效果类似，此处不再赘述。The information flow sorting apparatus in this embodiment can be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

<计算机存储介质><Computer storage medium>

本实施例中，还提供一种计算机存储介质，其上存储有计算机程序，计算机程序在被处理器执行时实现如本发明任意实施例的信息流的排序方法。In this embodiment, a computer storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, the method for sorting an information flow according to any embodiment of the present invention is implemented.

本领域技术人员应当理解，在电子技术领域中，可以通过软件、硬件以及软件和硬件结合的方式，将上述方法体现在产品中本领域技术人员很容易基于上面发明实施例的方法，产生一种信息处理装置，所述信息处理装置包括用于执行根据上述实施例的信息处理方法中的各个操作的模块。Those skilled in the art should understand that in the field of electronic technology, the above method can be embodied in a product by means of software, hardware or a combination of software and hardware. Those skilled in the art can easily generate a An information processing apparatus including modules for performing respective operations in the information processing method according to the above-described embodiments.

本领域技术人员公知的是，随着诸如大规模集成电路技术的电子信息技术的发展和软件硬件化的趋势，要明确划分计算机系统软、硬件界限已经显得比较困难了。因为，任何操作可以软件来实现，也可以由硬件来实现。任何指令的执行可以由硬件完成，同样也可以由软件来完成。对于某一机器功能采用硬件实现方案还是软件实现方案，取决于价格、速度、可靠性、存储容量、变更周期等非技术性因素。对于技术人员来说，软件实现方式和硬件实现方式是等同的。技术人员可以根据需要选择软件或硬件来实现上述方案。因此，这里不对具体的软件或硬件进行限制。It is well known to those skilled in the art that with the development of electronic information technology such as large-scale integrated circuit technology and the trend of software and hardware, it has become difficult to clearly demarcate the software and hardware boundaries of computer systems. Because, any operation can be implemented by software or by hardware. Execution of any instruction can be done by hardware as well as by software. The hardware implementation or software implementation for a certain machine function depends on non-technical factors such as price, speed, reliability, storage capacity, and change cycle. For the skilled person, the software implementation and hardware implementation are equivalent. Technicians can choose software or hardware to implement the above solutions as needed. Therefore, there is no specific software or hardware limitation here.

本发明可以是设备、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但��限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above. Computer-readable storage media, as used herein, are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在��个计算/处理设备中的计算机可读存储介质中。The computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本发明的各个方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect). In some embodiments, custom electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), can be personalized by utilizing state information of computer readable program instructions. Computer readable program instructions are executed to implement various aspects of the present invention.

这里��据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是，通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation in hardware, implementation in software, and implementation in a combination of software and hardware are all equivalent.

以上已经描述了本发明的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method for ordering information streams, the method comprising:

obtaining an information flow recommendation list according to the user characteristics and the current time state;

obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;

and sorting the information streams in the information stream recommendation list according to the scores.

2. The method of claim 1, wherein obtaining the information flow recommendation list according to the user characteristics and the current time state comprises:

calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function;

and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.

3. The method of claim 2, wherein the reward value reward for the current time is calculated by the following formula:

wherein click + β read _ time is the user action, β is a weighting factor,

adjusting a weight factor for the location; and N is the number of information streams.

4. The method of claim 3, wherein the loss function critic _ loss of the first mapping function is expressed as:

critic_loss＝reward+gamma*v_t+1-v_t；

wherein, reward is the return value of the current time; gamma is a smoothing factor; v. of_tThe current time state is obtained; v. of_t+1The state of the next moment.

5. The method of claim 3, wherein the loss function of the second mapping function, operator loss, is expressed as:

actor_loss＝reward_gain*td_error；

wherein td _ error is the evaluation value; the rewarded _ gain is a rewarded-origin _ rewarded value gain of the current time, the origin _ rewarded is an original rewarded value, and the rewarded is the rewarded value of the current time.

6. The method of claim 2, further comprising:

acquiring log information;

updating the first mapping function according to the current time state, the return value of the current time, the next time state and the user action in the log information;

and updating the second mapping function according to the user characteristics, the current time state and the evaluation value in the log information.

7. An apparatus for sorting a stream of information, the apparatus comprising:

the acquisition module is used for acquiring an information flow recommendation list according to the user characteristics and the current time state;

the scoring module is used for obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;

and the sorting module is used for sorting the information streams in the information stream recommendation list according to the scores.

8. The apparatus of claim 7, wherein the obtaining module is specifically configured to:

9. An apparatus for sorting a stream of information, the apparatus comprising: a memory for storing instructions and a processor; the instructions are for controlling the processor to operate so as to carry out the method of sorting of information streams according to any one of claims 1 to 6.

10. A computer storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the method of sorting of information flows according to any one of claims 1-6.