基于机器学习的文本分类技术研究进展
【名称】: 基于机器学习的文本分类技术研究进展【作者】:苏金树 ,张博锋H,徐 昕 ,
(国防科学技术大学 计算机学院,湖南 长沙 410073)
(国防科学技术大学 机电工程与 自动化学院,湖南 长沙 410073)
【格式】:PDF
【页数】:12
【语言】:中文
【摘要或目录】:
Advances in M achine Learning Based Text CategOrizatiOn
SU Jin-Shu , ZHANG Bo.FengH
XU Xin '
,
。(School ofComputer,National University ofDefense Technology,Changsha 410073,China)
(School ofMechantronics Engineering and Automation,National University ofDefense Technology,Changsha 410073,China)
+Corresponding author:Phn:+86—731—4513504,E—mail:bfzhang@nudt.edu.cn
Su JS,Zhang BF,Xu X.Advances in machine learning based text categorization.Journal of Software.
2006,17(9):1848—1859.http://www.jos.org.cn/1000-9825/17/1848.htm
Abstract: In recent years,there have been extensive studies and rapid progresses in automatic text categorization
,
which is one of the hotspots and key techniques in the information retrieval and data mining field
. Highlighting the
state‘of-art challenging issues and research trends for content information processing of Internet and other complex
applications,this paper presents a survey on the up-to-date development in text categorization based on machine
learning,including model,algorithm and evaluation.It is pointed out that problems such as nonlinearity,skewed
data distribution,labeling bottleneck,hierarchical categorization,scalability of algorithms and categorization of
Web pages are the key problems to the study of text categorization.Possible solutions to these problems are also
discussed respectively.Finally,some future directions of research are given.
Key words: automatic text categorization;machine learning;dimensionality reduction;kernel method;unlabeled
data set;skewed data set;hierarchical categorization;large-scale text categorization;W eb page
categorization
摘 要: 文本 自动分类是信息检 索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速
的发展.提出了基于机器学习的文本分类技术所面-I盘的互联 网内容信息处理等复杂应用的挑战,从模型、算法和
评测等方面对其研究进展进行综述评论.认为非线性、数据集偏斜、标注瓶颈 、多层分类、算法的扩展性及
Web页分类等问题是 目前文本分类研究的关键问题,并讨论 了这些问题可能采取的方法.最后对研究的方向进
行 了展 望.
关键词: 自动文本分类;机器学习;降维;核方法;未标注集;偏斜数据集;分级分类;大规模文本分类:web页分类
中图法分类号:TP181 文献标识码:A 谢谢楼主的分享
页:
[1]