q2d is an automatic data generation pipeline that generates information-seeking dialogs from questions. q2d effectively replaces human-annotated data for training query-generation models and creates high-quality training and evaluation data across multiple domains.
We use PaLM to create conversational QA datasets, enhancing query generation models for dialogs. Our method offers automatic generation of query-based grounded dialogs, providing better control and scalability than human-written dialogs.
Models trained on auto-generated data reach 90-95% performance of those on human-annotated data, showcasing q2d's effectiveness in reducing annotation effort and producing synthetic training data close to human quality.
We show that our q2d method is flexible and can adapt to specific dialog styles without annotated data.
It generates labeled query-generation datasets that are useful for training and evaluation.
This is demonstrated through multi-hop QA.
Naturalness, Factuality, and Correctness - Our in-depth analysis shows that humans have difficulty distinguishing synthetic dialogs from natural ones.
Additionally, auto-generated answers perform similarly to human-annotated answers in query generation.
We assess the factuality and relevancy of PaLM-generated answers using an NLI-based evaluation.
The results demonstrate the high quality of the generated dialogs.
q2d's auto-generated dialogs enable query generation models to adapt and improve for specific dialog styles, creating labeled datasets for training and evaluation.
T5 model predictions above/below the line show the impact of fine-tuning on MuSiQue dialogs.
@article{bitton2023q2d,
title={q2d: Turning Questions into Dialogs to Teach Models How to Search},
author={Bitton, Yonatan and Cohen-Ganor, Shlomi and Hakimi, Ido and Lewenberg, Yoad and Aharoni, Roee and Weinreb, Enav},
journal={arXiv preprint arXiv:2304.14318},
year={2023}
}