Document Type


Publication Date

Spring 4-8-2009


This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model.

In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends.


The version of record is available at Published by the Korean Institute of Information Scientists and Engineers (KIISE). Copyright © 2009, KIISE and the authors. Creative Commons Attribution License.