Skip to content

This is a demo project for news text classification, part of this work has been admitted for the homework of Hefei University of Technology Data Mining course(0521550X).

Notifications You must be signed in to change notification settings

Sensente/News_Text_Classification_Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

News Text Classification Demo / 新闻文本分类简单项目示例

本项目是一个简单的新闻文本分类任务,部分代码已经被提交至合肥工业大学数据挖掘 (数据与智能工程,0521550X)课程作业。本代码仅供参考,不建议用作作业提交。

This is a demo project for news text classification, part of this work has been admitted for the homework of Hefei University of Technology Data Mining course(0521550X). I strongly do not recommend you submit this project for homework again.

本项目不包含数据集!

This project does not contain any dataset!

Please Star!

version Page Views Count

Contents

Background

新闻文本分类是自然语言处理领域的经典任务,本项目通过一些经典机器学习方法处理了一个公开新闻语料库。

Text classification is a fundamental task in the Natural Language Processing (NLP) field. For this project, we select a public News corpus and use some classic, machine learning methods to process it.

Usage

数据预处理 / Data preprocessing

本项目针对的数据集是JSON格式,一条数据包含标题-内容-其他信息等多个数据。因此需要对数据集进行预处理,转换JSON格式,提取文本内容,提取标题,打标签等。这些代码均在 data_preprocessing 中。

We processing the JSON format dataset, a single data containing Header-Content-Others information. It is necessary to process the dataset and transform it to the proper format before feeding it to the model.

All these preprocessing codes are in the data_preprocessing.

算法 / algorithm

使用一些经典机器学习算法对处理后的文本进行分类,这里包含了多种经典机器学习方法。包含在 algorithm 中。

All the algorithms tested in this demo project are in the algorithm.

Maintainers

@Sensente.

License

Apache2.0 © Sensente

About

This is a demo project for news text classification, part of this work has been admitted for the homework of Hefei University of Technology Data Mining course(0521550X).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages