An Efficient Unsavory Data Detection Method for Internet Big Data

Peige Ren; Xiaofeng Wang; Hao Sun; Fen Xu; Baokang Zhao; Chunqing Wu

An Efficient Unsavory Data Detection Method for Internet Big Data

Fiche du document

Auteurs

Date

4 octobre 2015

Discipline

Sciences de l'information et de la communication

Type de document

Colloques et conférences

Périmètre

Publications

Langue

Anglais

Identifiants

Source

HAL-SHS : sciences de l'information, de la communication et des bibliothèques

Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-319-24315-3_21

Collection

Archives ouvertes

Organisation

Centre pour la communication scientifique directe

Licences

http://creativecommons.org/licenses/by/ , info:eu-repo/semantics/OpenAccess

Mots-clés En

High-dimensional feature space Principal component analysis Multi-dimensional index Semantics-based similarity search

Sujets proches En

DARPA Internet Internet (Computer network)

Citer ce document

Peige Ren et al., « An Efficient Unsavory Data Detection Method for Internet Big Data », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10.1007/978-3-319-24315-3_21

Partage / Export

Résumé En

With the explosion of information technologies, the volume and diversity of the data in the cyberspace are growing rapidly; meanwhile the unsavory data are harming the security of Internet. So how to detect the unsavory data from the Internet big data based on their inner semantic information is of growing importance. In this paper, we propose the i-Tree method, an intelligent semantics-based unsavory data detection method for internet big data. Firstly, the internet big data are mapped into a high-dimensional feature space, representing as high-dimensional points in the feature space. Secondly, to solve the “curse of dimensionality” problem of the high-dimensional feature space, the principal component analysis (PCA) method is used to reduce the dimensionality of the feature space. Thirdly, in the new generated feature space, we cluster the data objects, transform the data clusters into regular unit hyper-cubes and create one-dimensional index for data objects based on the idea of multi-dimensional index. Finally, we realize the semantics-based data detection for a given unsavory data object according to similarity search algorithm and the experimental results proved our method can achieve much better efficiency.

An Efficient Unsavory Data Detection Method for Internet Big Data

Fiche du document

Mots-clés En

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en