Semantic Search in QA: A critical look at our legacy in Automatic Question Answering

David Ferrucci
IBM Watson Research Center

Our approach to automatic question answering has focused on deep question analysis to determine the answer type, high-precision passage retrieval that requires passages to contain the expected answer type, and shallow answer selection to achieve quick response times. It relies heavily on a rich ontology of semantic types and a robust name-entity detector. This approach has maintained a relatively high standing in the TREC QA track, while staying aligned with enterprise search applications that prefer high-precision passage retrieval and fast response time. In terms of absolute QA accuracy however, the approach seemed to hit a barrier. Continued investment in enriching the type ontology and associated named-entity detection failed to get the system beyond ~30% accuracy in the TREC QA track. Moreover, this strategy is labor intensive and costly to apply to new application domains. In this talk, I will provide a critical overview of the basic performance and rough error model of our QA system. I will present our initial steps in reversing our direction from a high-precision semantic search approach to a high-recall approach that invests in more complex answer scoring techniques. These include deeper inference, web reinforcement and better modeling of syntactic relations and may rely on massive parallelization to achieve desired response times.


Back to Long Programs