Supervised Data Extraction

The process of data extraction from internet sources have beenoriginating the interest of the scientific society for the past years. However thereare still no well established standards because of the heterogeneous nature ofthe information in the Global Network. Nevertheless there is still something incommon – all the data is available in HTML format for compatibility reasons.This article presents our methodology and the prototype system we've createdto extract data from HTML pages. We use XPath as data extraction languageand have developed a methodology for visual wrapper generation. Ourapproach takes advantage of the implicit correlation between the data and thesurrounding structure. Some evaluation tests are given also in order justify ourmethods.

Supervised Data Extraction

Fiche du document

Mots-clés En Fr

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en