Query auto-completion (QAC) aims at suggesting plausible completions for a given query prefix. The recent QAC methods introduce Natural Language Generation to generate the completions for user input.
However, NLG (Natural Lagnuage Generation) methods ususally output unsense or wrong words without controll. Moreover, A serious drawback of generative methods is that they can produce an ether effect. It severely affected the performance of the generative methods.
We proposed a framework that controls the generation of queries using prompt learning methods, thereby making the generative methods controllable. This framework consists of three parts: the control module, the prompt module, and the generation module. The control module generates a prompt vector endowed with implicit features, then the prompt module ingests the prompt vector and user input into the generation module, and ultimately, the generation module generates the query under control.
We trained and tested our model on the Meituan dataset and the AOL dataset. The outcomes reveal that the framework we proposed can elevate the accuracy of queries while mitigating the incoherence of queries.
在QAC场景中,我们将问题转换为自然语言处理的问题。因为用户的输入和生成都是自然语言,即$I={i_1,i_2 \cdots i_n}$,$G={g_1,g_2 \cdots g_n}$,其中$i$和$g$都表示具体的字符token,所以$G=f( I )$可以看作一个自回归问题,即$g_n=f( i_1,i_2, \cdots,g_{n-2},g_{n-1})$。
Query Auto-completion is a technology that uses users’ minimal input to generate possible desired results for them, thereby saving them time during the search process. Assuming $I$ represents user input, $G$ represents completed results, and $u$ represents user information, then $G=f(I)$ represents generating results completely utilizing user input, while a more personalized generation is $G_p=g(I,u)$.
In the QAC scenario, we convert the problem into a natural language processing problem. Since both user input and generated results are natural language, namely $I={i_1,i_2 \cdots i_n}$ and $G={g_1,g_2 \cdots g_n}$ where $i$ and $g$ represent specific character tokens, therefore $G=f(I)$ can be regarded as an autoregressive problem, namely $g_n=f(i_1,i_2, \cdots,g_{n-2},g_{n-1})$.
First, we introduce and explain the overall architecture of the framework, which includes three parts: the generation module, the prompt module, and the control module. The generation module is a decoder architecture that predicts the next token and generates text. It typically uses a pre-trained model like GPT. To enable control over the generation effect of the generation module, we use prompt learning methods for control. This means adding a prompt vector to the input of the generation module before generating the text. The prompt vector can provide hints to the model, which is the role of the prompt module. The prompt vector generated in the prompt module is produced by the control module through learning from historical data with controlled purpose labels. This generates a prompt vector with controlling effects.
The input to the control module is the personal feature information of the user and the user’s historical data, and the output is the probability of the user performing a click behavior. Whether the user clicks on the generated words represents the user’s preferences. The model uses the Bert model because the Bert model has deep semantic understanding capabilities and can perform deep semantic understanding and feature extraction on the user’s historical behavioral data. To reflect the controlling effect of the control module, in addition to letting the control module generate high-dimensional feature vectors reflecting user preferences, we also consider that the generative model has a severe Matthew effect phenomenon. Because the generative model is essentially a word probability prediction model, the generative model will predict tokens that appear frequently in the training set with a higher probability. Therefore, in addition to controlling the generation of user preference-conforming results, the control module can also control the Matthew effect of the generative model.Therefore, the output of the Bert model in the control module will be input to two multi-layer neural network classifiers for multi-task learning. One classifier is used to distinguish whether the user performs a click, and the other classifier is used to distinguish whether the user is more likely to click on the generated results of low-frequency words. By modeling the user’s personal information and historical behavior data, the control module can extract a feature vector representing the user’s preferences.
Authors made several expiations for their experiment results, but from the results, it contradicts what authors explained.
In addition, authors observed the contradict result when comparing soft prompt and hard prompt, the explanation was not quite convincible.
Selected architecture is not well motivated compared to similar more flexible approaches like RAG
missed an important span of work that would have beeen totally relevant for the presented use-case
Learning to Write with Cooperative Discriminators
This paper is not well-written and not presented effectively in appropriate format. Some typos are found and also the presentation of figures (like Figure 3) can be improved.
There are some typos in the paper, i.e. in section 3.2 first paragraph, Figure index was missing.
Since this is a framework, authors should present more variety datasets to prove this framework work on different domains and datasets, only one specific domain which cannot persuade audience to believe this framework will work for other domains or tasks.
experiments section heavily focuses on different variations of the same method
designed architecture is not flexible and would require retraining given changes in base models
outdated generative model (GPT2) in the experiments make the reader wonder what would be the results with more capable and versatile generative models
failed to showcase the flexibility of the approach: flexibility is cited as an advantage of the approach, but not highlighted in the experiments
More recent baselines can be chosen. The paper lacks comparisons and discussions with widely-known baselines in the field, which hinders the assessment of the novelty and performance of UCTG.
