16-02-2011, 10:33 AM
TopCells: Supporting Keyword Search in Text Cube
TopCells: Keyword Search
Keyword Search
– Simple but popular (as we can see in Zhai’s course)
– A lot of studies in IR community
Recently, Keyword Search in DB (DB+IR)
– Finding the close connection between tuples
Motivation
Search a light weighted powerful laptop from customer reviews
Text Cube
Aggregating Multi-dimensional Text Data
TopCells: Keyword Query
Simple Keyword Query
– A set of keywords
Extended Keyword Query
– With dimension constraints
TopCells: Ranking Cells
Given a keyword query q = {t1, …, tm}
How to rank the cells?
Relevance of a cell C to the given query
– Average the relevance of documents in a cell
– s(d, q) could be ANY IR scoring formula, like Okapi
Algorithm 1 (one-scan of inverted index)
Naïve one-scan inverted-index-based algorithm
– Compute s(d, q) for all documents d’s
– For each d, update rel(C, q) for all cells C’s containing d
(There are 2dim cells containing d)
– Output top-k cells C’s with highest rel(C, q)’s
– Deficiency
– Given a query q, equivalent to scanning all cells once
– When the dimensionality is high, the number of cells is huge
– Solution (search space ordering)
– Explore as small number of cells as possible
Algorithm 2 (search-space ordering)
Monotonicity of relevance score
– Used to estimate upper bounds
– Search-Space-Ordering algorithm
– Starting from single documents
– Bottom-to-up computation: aggregating cells of small sizes into big ones
– Aggregating cells with higher scores first
– Partially-aggregated cell
– Output C, when C is fully-aggregated and rel(C, q) is no less than the upper bound of rel(C’, q) for all partially-aggregated cells C’
download full report
https://netfiles.uiuc.edu/bding3/www/pap...090508.ppt