Data Science Challenges of Automated Quality Verification Process in Product Data Catalogues

Data Science Challenges of Automated Quality Verification Process in Product Data Catalogues


download PDF

Abstract. Product master data are an essential and key component of purchasing processes, ensuring the smooth running of business operations within companies. Unfortunately, due to the lack of a single, complete, worldwide information system storing reference data, managing the data, maintaining its quality, reliability, and timeliness, requires building quality assurance teams for such processes in most companies. There are numerous errors in product data, and identification and correction of them are time-consuming, especially for large data sets that contain many millions of products. These errors are due to the so-called human factor but are also the result of technical errors and limitations of IT systems. Therefore, in the paper, we proposed a number of solutions by category and group that can automate, simplify, and shorten the master data management process. There are also presented examples of data validation using a variety of techniques, rule-based, dictionary-based, and machine learning, that enable mass verification of both images, textual parameters, digital parameters, and classifiers, while indicating the probability of errors in specific attributes as well as in their combination, and in some cases correcting or proposing correct records. The performed tests illustrate the magnitude of problems and potential on a sample dataset.

Product Catalogues, Product Data Quality Management, Master Data Synchronization, Machine Learning in Data Quality, GPT-3 in Product Catalogues

Published online 9/1/2023, 10 pages
Copyright © 2023 by the author(s)
Published under license by Materials Research Forum LLC., Millersville PA, USA

Citation: NIEMIR Maciej, MRUGALSKA Beata, Data Science Challenges of Automated Quality Verification Process in Product Data Catalogues, Materials Research Proceedings, Vol. 34, pp 390-399, 2023


The article was published as article 45 of the book Quality Production Improvement and System Safety

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

[1] M. Cao, Q. Zhang. Supply chain collaboration: Impact on collaborative advantage and firm performance, J. Oper. Manag. vol. 29 (2011) 163–180.
[2] Y. Hole et al. Service marketing and quality strategies, Period. Eng. Nat. Sci. 6 (2018) 182-196.
[3] S.A. Qalati et al. Effects of perceived service quality, website quality, and reputation on purchase intention: The mediating and moderating roles of trust and perceived risk in online shopping, Cogent Bus. Manag. 8 (2021) art.1869363.
[4] D. Appelbaum et al. Impact of business analytics and enterprise systems on managerial accounting, Int. J. Account. Inf. Syst. 25 (2017) 29-44.
[5] M. Niemir, B. Mrugalska. Basic Product Data in E-Commerce: Specifications and Problems of Data Exchange, Eur. Res. Stud. J. XXIV (2021) 317-329.
[6] M. Niemir, B. Mrugalska. Product Data Quality in e-Commerce: Key Success Factors and Challenges, In: Production Management and Process Control, 36 (2022), AHFE.
[7] M. Niemir, B. Mrugalska. Identifying the cognitive gap in the causes of product name ambiguity in e-commerce, Logforum 18 (2022) 357-364.
[8] W.K. Putri, V. Pujani. The influence of system quality, information quality, e-service quality and perceived value on Shopee consumer loyalty in Padang City, Int. Technol. Manag. Rev. 8 (2019) 10-15.
[9] T. Schäffer, D. Stelzer. Assessing tools for coordinating quality of master data in inter-organizational product information sharing, In: 13th Int. Conf. Wirtschaftsinformatik, February 12 15, 2017, St. Gallen, Switzerland.
[10] T. Wimmer, M. Scholz. Online Product Descriptions–Boost for your Sales? In: 14th Int. Conf. Wirtschaftsinformatik, February 23-27, 2019, Siegen, Germany.
[11] J. Mou et al. Impact of product description and involvement on purchase intention in cross-border e-commerce, Ind. Manag. Data Syst. 120 (2019) 567-586. 05 2019-0280
[12] A. Haug, J.S. Arlbjørn. Barriers to master data quality, J. Enterp. Inf. Manag. 24 (2011) 288 303.
[13] J. Abraham. Product information management. Theory and practice. Springer, 2014.
[14] L. Battistello et al. Implementation of product information management systems: Identifying the challenges of the scoping phase, Comput. Ind. 133 (2021) art.103533.
[15] L. Poon et al. Unsupervised Anomaly Detection in Data Quality Control, In: 2021 IEEE Int. Conf. Big Data, Dec. 2021, 2327-2336.
[16] K. Muszyński, M. Niemir, S. Skwarek. Searching for AI Solutions to Improve the Quality of Master Data Affecting Consumer Safety, In: 22nd Int. Sci. Conf. Business Logistics in Modern Management, October 6-7, 2022, Osijek, Croatia, 121-140. [Online]. Viewed: 10-01-2023. Available: http://blmm content/uploads/BLMM2022_Conference_Proceedings.pdf
[17] J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv, 2019, art. 1810.04805.
[18] S.M. Jayanthi et al. NeuSpell: A Neural Spelling Correction Toolkit, In: Proc. 2020 Conf. Empirical Methods in Natural Language Processing, online, Oct. 2020, 158-164.
[19] W.-C. Lin, C.-F. Tsai. Missing value imputation: a review and analysis of the literature (2006–2017), Artif. Intell. Rev. 53 (2020) 1487-1509.
[20] B. Ghojogh, A. Ghodsi. Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey, OSF Preprints, 17 Dec. 2020.