Supervised Data Extraction

Fiche du document

Date

2005

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Philosophy--Methodology

Citer ce document

N. Georgiev et al., « Supervised Data Extraction », HAL-SHS : linguistique, ID : 10670/1.8o1srj


Métriques


Partage / Export

Résumé En

The process of data extraction from internet sources have beenoriginating the interest of the scientific society for the past years. However thereare still no well established standards because of the heterogeneous nature ofthe information in the Global Network. Nevertheless there is still something incommon – all the data is available in HTML format for compatibility reasons.This article presents our methodology and the prototype system we've createdto extract data from HTML pages. We use XPath as data extraction languageand have developed a methodology for visual wrapper generation. Ourapproach takes advantage of the implicit correlation between the data and thesurrounding structure. Some evaluation tests are given also in order justify ourmethods.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en