عنوان المقالة:A Comparison Study of Data Scrubbing Algorithms and Frameworks in Data Warehousing
حامد ابراهيم | Hamed Ibrahim | 5130
نوع النشر
مجلة علمية
المؤلفون بالعربي
Hamed Ibrahim Housien, Zhang Zuping, Zainab Qays Abdulhadi
الملخص العربي
In these days, many organizations tend to use a Data Warehouse to meet the requirements to develop decision-making processes and achieve their goals better and satisfy their customers. It enables Executives to access the information they need in a timely manner for making the right decision for any work. Decision Support System (DSS) is one of the means that applied in data mining . Its robust and better decision depends on an important and conclusive factor called Data Quality (DQ), to obtain a high data quality using Data Scrubbing (DS) which is one of data Extraction Transformation and Loading (ETL) tools. Data Scrubbing is very important and necessary in the Data Warehouse (DW). There are growing relationships to get high DQ and effective DS. The use of DS algorithms is a solution to the constraints that limit the DQ which leads to weak decisions and the burden of the high financial costs. These constraints are: dirty data, noise data, missing values, inconsistency, uncertain data, ambiguous, conflicting, duplicated records and similar columns. The Sources and causes of these constraints are many, including: input error, merge data from different sources, difference in representing the same information, etc. In addition there are more than 35 sources and causes of the poor-quality data that arise at the stage of the ETL process. This paper present comparison and analysis for DS algorithms and the pros and cons of each algorithm, accuracy and time complexity. Additionally, it present a comparative and analysis of the Data Scrubbing Frameworks and determine the best framework.
تاريخ النشر
23/04/2013
الناشر
International Journal of Computer Applications IJCA(0975-8887)
رابط الملف
تحميل (224 مرات التحميل)
رابط خارجي
http://www.ijcaonline.org/archives/volume68/number25/11752-7406
الكلمات المفتاحية
Data scrubbing, Data warehousing, Data Quality, ExtractTransform-Load (ETL).
رجوع