Glossary  HistCite Guide
3404 View: Overview
Author(s)Aizawa A
TitleAn information-theoretic perspective of tf-idf measures
SourceINFORMATION PROCESSING & MANAGEMENT 39(1):45-65
Date2003 JAN
TypeJournal : Article
LCR6   NCR: 45   LCS1   GCS: 1
Comment 
AddressNatl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan.
ReprintAizawa, A, Natl Inst Informat, Chiyoda Ku, 2-1-2 Hitotsubashi, Tokyo
1018430, Japan.
AbstractThis paper presents a mathematical definition of the "probability-weighted amount of information" (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency measures that are commonly used in today's information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation. (C) 2002 Elsevier Science Ltd. All rights reserved.
CR *NAT CTR SCI INF S, 1999, NTCIR WORKSH 1 P 1 N
AIZAWA A, 2000, P ACM SIGIR2000, P104
AIZAWA A, 2001, P 6 NAT LANG PROC PA, P307
AMATI G, 1998, INFORMATION RETRIEVA, P189
BAAYEN RH, 2001, WORD FREQUENCY DISTR
BAEZAYATES R, 1988, MODERN INFORMATION R
BROOKES BC, 1972, J DOC, V28, P160
CHURCH KW, 1990, COMPUTATIONAL LINGUI, V6, P22
CHURCH KW, 1999, NATURAL LANGUAGE PRO, P283
COVER TM, 1991, ELEMENTS INFORMATION
CRESTANI F, 2000, J INFORMATION RETRIE, V2, P23
CROFT WB, 1979, J DOC, V35, P285
DENNIS SF, 1964, MISCELLANEOUS PUBLIC, V269
DUNNING T, 1993, COMPUTATIONAL LINGUI, V19, P61
FUHR N, 1989, INFORM PROCESS MANAG, V25, P55
FUNG P, 1997, MACHINE TRANSLATION, V12, P53
GREFENSTETTE G, 1994, EXPLORATIONS AUTOMAT
GREIFF WR, 1998, P 21 ANN INT ACM SIG, P11
HIEMSTRA D, 2000, INT J DIGITAL LIB, V3, P131
JOACHIMS T, 1997, P 14 INT C MACH LEAR, P143
KAGEURA K, 1996, TERMINOLOGY, V3, P259
KAGEURA K, 1999, P 1 NTCIR WORKSH RES, P42
KITA K, 1999, PROBABILISTIC LANGUA
KOLLER D, 1996, P 13 INT C MACH LEAR, P284
KOLLER D, 1997, P 14 INT C MACH LEAR, P170
LEWIS DD, 1994, P 3 ANN S DOC AN INF, P81
LUHN HP, 1957, IBM J RES DEV, V1, P309
MANNING CD, 1999, FDN STAT NATURAL LAN
MATSUMOTO Y, 1999, NAISTISTR99012
MCCALLUM A, 1998, WS9805, P41
MLADENIC D, 1998, P 10 EUR C MACH LEAR, P95
NAGAO M, 1976, T IPSJ, V17, P110
ROBERTSON SE, 1976, J AM SOC INFORM SCI, V27, P129
ROBERTSON SE, 1990, J DOC, V46, P359
ROBERTSON SE, 1994, J DOC, V50, P233
SALTON G, 1983, INTRO MODERN INFORMA
SALTON G, 1988, INFORMATION PROCESSI, V24, P513
SLONIM N, 2000, P 23 ANN INT ACM SIG, P208
SMADJA F, 1993, COMPUTATIONAL LINGUI, V19, P143
SPARCKJONES K, 1972, J DOC, V28, P11
VANRIJSBERGEN CJ, 1981, INFORMATION PROCESSI, V17, P77
WIENER E, 1995, P 4 ANN S DOC AN INF, P317
WONG SKM, 1992, J AM SOC INFORM SCI, V43, P54
YANG Y, 1997, P 14 INT C MACH LEAR, P412
YANG Y, 1999, P ACM SIGIR C RES DE, P42