A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

Articolo

Data di Pubblicazione:

2020

Citazione:

A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario / Cauteruccio, F., LO GIUDICE, P., Musarella, L., Terracina, G., Ursino, D., Virgili, L.. - In: INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING. - ISSN 0219-6220. - 19:3(2020), pp. 849-889. [10.1142/s0219622020500182]

Abstract:

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.

Tipologia CRIS:

1.1 Articolo in rivista

Keywords:

Unstructured sources; interschema property derivation; structuring unstructured data; big data

Elenco autori:

Cauteruccio, Francesco; LO GIUDICE, Paolo; Musarella, Lorenzo; Terracina, Giorgio; Ursino, Domenico; Virgili, Luca

Link alla scheda completa:

https://iris.unirc.it/handle/20.500.12318/133754

Pubblicato in:

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING

Journal