Artículos de revistas
A Provenance-based Approach To Evaluate Data Quality In Escience
Registro en:
International Journal Of Metadata, Semantics And Ontologies. , v. 9, n. 1, p. 15 - 28, 2014.
17442621
10.1504/IJMSO.2014.059124
2-s2.0-84893912250
Autor
Gonzales Malaverri J.E.
Santanche A.
Medeiros C.B.
Institución
Resumen
Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include (a) the specification of a framework to track data provenance and use it to derive quality information, (b) a model for data provenance based on the Open Provenance Model, and (c) a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system. Copyright © 2014 Inderscience Enterprises Ltd. 9 1 15 28 Barbosa, I., Casanova, M.A., Trust indicator for decisions based on geospatial data (2011) Proceedings of the XII Brazilian Symposium on GeoInformatics, pp. 49-60. , 27-29 November, Brazil Barga, R.S., Digiampietri, L.A., Automatic generation of workflow provenance (2006) Proceedings of the 2006 International Conference on Provenance and Annotation of Data, 4145, pp. 1-9. , Moreau, L. and Foster, I.T. (Eds) , Springer-Verlag Barga, R.S., Jackson, J., Araujo, N., Guo, D., Gautam, N., Grochow, K., Lazowska, E.D., Trident: Scientific workflow workbench for oceanography (2008) SERVICES I. IEEE Computer Society, pp. 465-466 Blake, R., Mangiameli, P., The effects and interactions of data quality and problem complexity on classification (2011) Journal of Data and Information Quality, 2, pp. 81-828 Brown, M.E., Pinzfion, J.E., Didan, K., Morisette, J.T., Tucker, C.J., Evaluation of the consistency of longterm ndvi time series derived from avhrr, spotvegetation, seawifs, modis, and landsat etm+ sensors (2006) IEEE Transactions on Geoscience and Remote Sensing, 44 (7), pp. 1787-1793 Buneman, P., Chapman, A., Cheney, J., Provenance management in curated databases (2006) Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 539-550. , ACM, New York, NY, USA Chapman, A.D., (2005) Principles of Data Quality, , Global Biodiversity Information Facility, Copenhagen Cheah, Y., Plale, B., Provenance analysis: Towards quality provenance (2012) Proceedings of 8th IEEE International Conference on EScience 2012, , 8-12 October, Chicago, USA Cheney, J., Chiticariu, L., Tan, W., Provenance in databases: Why, how, and where (2009) Foundations and Trends in Databases, 1 (4), pp. 379-474 Cohen-Boulakia, S., Biton, O., Cohen, S., Davidson, S., Addressing the provenance challenge using ZOOM (2008) Concurrency and Computation: Practice and Experience, 20, pp. 497-506 Congalton, R.G., Green, K., (2009) Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, , 2nd ed., CRC Press, Boca Raton, FL Da Silva, P.P., McGuinness, D.L., Fikes, R., A proof markup language for semantic web services (2006) Information Systems, 31 (4), pp. 381-395 Dai, C., Lin, D., Bertino, E., Kantarcioglu, M., An approach to evaluate data trustworthiness based on data provenance (2008) Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82-98. , Springer-Verlag, Berlin, Heidelberg Deering, D., (1978) Rangeland Reectance Characteristics Measured by Aircraft and Spacecraft Sensors, , PhD Thesis, Texas A&M Univ., College Station Ding, L., Kolari, P., Finin, T., Joshi, A., Peng, Y., Yesha, Y., On homeland security and the semantic web: A provenance and trust aware inference framework (2005) AAAI Spring Symposium: AI Technologies for Homeland Security, AAAI, pp. 157-160 (1998) Content Standard for Digital Geospatial Metadata FGDC-STD-001-1998, , FGDC Technical report, US Geological Survey Garcia-Molina, H., Ullman, J.D., Widom, J., (2008) Database Systems: The Complete Book, , Prentice Hall Press (2006) Geonames, , http://www.geonames.org/, (accessed on January 2013) Gonzalez, R.C., Woods, R.E., (2006) Digital Image Processing, , 3rd ed., Prentice-Hall, Inc., Upper Saddle River, NJ, USA Goodchild, M.F., Li, L., Assuring the quality of volunteered geographic information (2012) Spatial Statistics, 1, pp. 110-120 Hartig, O., Provenance information in the web of data (2009) Proceedings of the 2nd Workshop on Linked Data on the Web (LDOW2009), , 20 April, Madrid, Spain Hartig, O., Zhao, J., Using web data provenance for quality assessment (2009) Proceedings of the Workshop on Semantic Web and Provenance Management at ISWC, , October, Washington DC, USA Jøsang, A., Ismail, R., Boyd, C., A survey of trust and reputation systems for online service provision (2007) Decision Support Systems, 43, pp. 618-644 Kondo, A.A., Medeiros, C.B., Bacarin, E., Madeira, E.R.M., Traceability in food for supply chains (2007) Proceedings of 3rd International Conference on Web Information Systems and Technologies (WEBIST), pp. 121-127. , 3-6 March, Barcelona, Spain Lebo, T., Sahoo, S., McGuinness, D., (2013) PROV-O: The PROV Ontology, , http://www.w3.org/TR/prov-o/, (accessed on 30 April 2013) Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y., AIMQ: A methodology for information quality assessment (2002) Information & Management, 40 (2), pp. 133-146 Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Bizer, C., Dbpedia-A large-scale, multilingual knowledge base extracted from wikipedia (2013) Semantic Web Journal, , Under review Lemos, F., (2013) Infrastructure and Algorithms for Information Quality Analysis and Process Discovery, , PhD Thesis, Ingénierie des Systèmes d'Information Macário, C.G.N., Medeiros, C.B., A framework for semantic annotation of geospatial data for agriculture (2009) International Journal of Metadata, Semantics and Ontology, 4 (1-2), pp. 118-132 Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H., Overview and framework for data and information quality research (2009) Journal of Data and Information Quality, 1, pp. 21-222 Malaverri, J.E.G., Medeiros, C.B., Data quality in agriculture applications (2012) Proceedings of the XIII Brazilian Symposium on GeoInformatics (GeoInfo), , 25-27 November, Brazil McGuinness, D., Da Silva, P.P., Explaining answers from the semantic web: The inference web approach (2004) Journal of Web Semantics, 1 (4), pp. 397-413 Moraes, R.A., Rocha, J., Imagens de coeficiente de qualidade (Quality) e de confiabilidade (Reliability) para seleção de pixels em imagens de NDVI do sensor MODIS para monitoramento da cana-de-açúcar no estado de São Paulo (2011) Proceedings of Brazilian Remote Sensing Symposium Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P.T., Kwasnikowska, N., Bussche, J.V.D., The open provenance model core specification (v1.1) (2011) Future Generation Computer Systems, 27 (6), pp. 743-756 Myers, J.D., Futrelle, J., Gaynor, J., Plutchak, J., Bajcsy, P., Kastner, J., Kotwani, K., Liu, Y., (2009) Embedding Data Within Knowledge Spaces, , CoRR (2012) National Aeronautics and Space Administration, , https://wist.echo.nasa.gov/api/, (accessed on April 2012) Naumann, F., (2002) Quality-Driven Query Answering for Integrated Information Systems, 2261. , Springer Naumann, F., Rolker, C., Assessment methods for information quality criteria (2000) IQ, MIT, pp. 148-162 (2009) National Center for Supercomputing Applications, , http://leovip217.ncsa.uiuc.edu/, (accessed on May 2013) Parssian, A., Managerial decision support with knowledge of accuracy and completeness of the relational aggregate functions (2006) Decision Support Systems, 42, pp. 1494-1502 Pastorello Jr., G.Z., (2008) Managing the Lifecycle of Sensor Data: From Production to Consumption, , PhD Thesis, Institute of Computing, University of Campinas, Brazil Pierce, E.M., Assessing data quality with control matrices (2004) Communications of the ACM, 47, pp. 82-86 Pipino, L.L., Lee, Y.W., Wang, R.Y., Data quality assessment (2002) Communications of the ACM, 45, pp. 211-218 Prat, N., Madnick, S., Measuring data believability: A provenance approach (2008) Proceedings of the 41st Hawaii International Conference on System Sciences, , 7-10 January Ram, S., Liu, J., Understanding the semantics of data provenance to support active conceptual modeling (2006) Active Conceptual Modeling of Learning, 4512, pp. 17-29. , in Chen, P.P. and Wong, L.Y. (Eds) , LNCS, Springer Reiter, M., Breitenbücher, U., Dustdar, S., Karastoyanova, D., Leymann, F., Truong, H.-L., A novel framework for monitoring and analyzing quality of data in simulation workflows (2011) Proceedings of the 2011 IEEE 7th International Conference on EScience, pp. 105-112. , 5-8 December Resnick, P., Kuwabara, K., Zeckhauser, R., Friedman, E., Reputation systems (2000) Communications of ACM, 43 (12), pp. 45-48 Sahoo, S.S., Sheth, A., Provenir ontology: Towards a framework for escience provenance management (2009) Microsoft EScience Workshop, , Pittsburgh, PA Sampaio, D.S.F.M., Dong, C., Sampaio, P., Incorporating the timeliness quality dimension in internet query systems (2005) WISE Workshops, 3807. , LNCS, Springer Scheidegger, C.E., Vo, H.T., Koop, D., Freire, J., Silva, C.T., Querying and re-using workflows with VsTrails (2008) Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1251-1254. , ACM Scholten, H., Cate, A.J.U.T., Quality assessment of the simulation modeling process (1999) Computers and Electronics in Agriculture, 22 (2-3), pp. 199-208 Simmhan, Y.L., Plale, B., Gannon, D., A survey of data provenance in e-science (2005) ACM SIGMOD Record, 34 (3), pp. 31-36 (2007) The Swift Project, , http://www.ci.uchicago.edu/swift/, (accessed on May 2013) Voisard, A., Medeiros, C.B., Jomier, G., Database support for cooperative work documentation (2000) Proceedings of the 4th International Conference on He Design of Cooperative Systems, , Sophia Antipolis, France, 23-26 May Wand, Y., Wang, R.Y., Anchoring data quality dimensions in ontological foundations (1996) Communications of ACM, 39, pp. 86-95 Wang, R.Y., Strong, D.M., Beyond accuracy: What data quality means to data consumers (1996) Journal of Management Information Systems, 12 (4), pp. 5-34 Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Goble, C., The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud (2013) Nucleic Acids Research, 41 (W1), pp. W557-W561 Xie, J., Burstein, F., Using machine learning to support resource quality assessment: An adaptive attribute-based approach for health information portals (2011) Proceedings of the 16th International Conference on Database Systems for Advanced Applications, , 22-25 April, Hong Kong