AI-Driven Data Quality and DataOps Management

Published: 30 Oct 2024, Last Modified: 09 Nov 2024ACM ICAIF P2P Workshop 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DataOPs in Finance, AI for Data QC
Abstract: Data Operations (DataOps), a crucial field in AI, has also been adopted in finance to automate routine processes and improve operations. We highlight the importance of data quality management in financial services, where data is often incomplete, inconsistent, and contains duplicates, outliers and missing values. Additionally, applications of two types of data Quality Control (QC) are shown; Rule-Based QC which involves setting predefined rules for data validation and Machine Learning (ML) based QC which uses predictive algorithms and anomaly detection to identify inaccurate and incorrect data points. DataOps further enhance QC by automating and streamlining data pipelines enabling continuous monitoring and quick identification of errors thus improving overall operational efficiency. We propose a platform-agnostic, scalable and customizable Python-based Advanced QC framework for performing data quality checks and anomaly detection. We illustrate how the Advanced Data QC framework can be used on publicly available financial datasets and showcase anomaly detection algorithms using ADBench data. The framework is designed to ensure data accuracy, consistency, and completeness, which is essential for meaningful analytics, predictive modelling and decision-making. This paper is an excellent reference for anyone looking to implement this frame- work on any internal organizational data and enhance data quality within their organization or processes from the ground up.
Submission Number: 10
Loading