NVIDIA’s ChatQA Achieves GPT-4 Level Accuracy in Conversational QA Models

Nvidia Soars to Record Highs as Goldman Sachs Raises Price Target Amidst AI Boom

NVIDIA has introduced ChatQA, a family of conversational question-answering (QA) models that have successfully achieved GPT-4-level accuracy without relying on synthetic data from OpenAI GPT models. This achievement positions ChatQA as a significant leap forward in the realm of conversational AI.

The ChatQA family comprises models ranging from 7B to 70B in size, with the top-performing ChatQA-70B model standing out by surpassing GPT-4 in average scores across ten conversational QA datasets. Notably, it outperforms GPT-3.5-turbo and achieves comparable performance to GPT-4, all achieved without the use of synthetic data from ChatGPT models.


One of the key innovations driving ChatQA’s success is the introduction of a two-stage instruction tuning method. In the first stage, the researchers employed supervised fine-tuning (SFT) on a combination of instruction-following and dialog datasets, providing the model with the ability to effectively follow instructions as a conversational agent. The second stage, known as context-enhanced instruction tuning, enhances the model’s proficiency in context-aware or retrieval-augmented generation in conversational QA.

A notable aspect of ChatQA’s methodology is the fine-tuning of a single-turn query retriever on a multi-turn QA dataset to address retrieval in conversational QA. This approach delivers comparable results to state-of-the-art query rewriting models but with reduced deployment costs, showcasing the practicality and efficiency of ChatQA’s techniques.


Furthermore, ChatQA introduces a new dataset, HumanAnnotatedConvQA, aimed at significantly enhancing the language model’s ability to integrate user-provided or retrieved-context for zero-shot conversational QA tasks. This demonstrates ChatQA’s commitment to addressing real-world challenges and pushing the boundaries of conversational AI capabilities.

In their empirical study, the researchers constructed various ChatQA models based on Llama2-7B, Llama2-13B, Llama2-70B, and in-house GPT-8B and GPT-22B models. The comprehensive analysis across 10 conversational QA datasets reveals that ChatQA-70B outperforms both GPT3.5-turbo and GPT-4 in terms of average score, showcasing its prowess in handling diverse conversational scenarios.

NVIDIA’s ChatQA is not the sole player in the race for GPT-4-level capabilities. Google is poised to launch Gemini Ultra, and Mistral CEO Arthur Mensch has announced plans to unveil an open-source GPT-4-level model in 2024. This collective progress signals an exciting era for conversational AI, with ChatQA leading the charge in achieving remarkable accuracies without the need for synthetic data from existing ChatGPT models.

For those interested in delving deeper into the intricacies of ChatQA, the research paper, titled “ChatQA: Building GPT-4 Level Conversational QA Models,” provides detailed insights into the methods and achievements of this groundbreaking conversational AI model.

Leave a Reply

Your email address will not be published. Required fields are marked *