Data Quality
Organisations need to be able to rely on the information in their
primary business applications. Inaccurate or inconsistent data can
prevent users ability to understand its current and future business
problems. This leads to poor decisions , negative results, lost
profits, operational delays, customer dissatisfaction and much more.
Data quality is one of the most important elements in any business
intelligence application, and it a prime change management element.
However, data quality is only one part of a total data strategy.
Data Strategy
An effective data quality strategy helps an organisation better
understand its business environment, support the improvement in
operational efficiency and improve decisions to maximize profitability.
The goal of data quality management is to provide the infrastructure
to transform raw data into consistent, accurate and reliable corporate
information. There are five components of data management technology:
- Data Profiling – inspecting data for
errors, inconsistencies, redundancies and incomplete information
- Data Quality – ensuring data is correct,
standardized and verified
- Data Integration – matching, merging
or linking data from a variety of disparate sources
- Data Augmentation – enhancing data using
information from internal and external data sources
- Data Monitoring – checking and controlling
data integrity over time
Data Quality During Build & Implementation
The best time to ensure data quality is during the build and implementation
phase of any new application. Unfortunately, too often this is where
the end user problems begin. No matter how well designed a data
model, often times, programmers are tempted [and do] corrupt the
model and schema, to fit an application model.
Data cleansing, validation and modelling is a laborious, time consuming
job, and is extremely difficult to estimate on a project plan.
A one-week project to map data between legacy and new data models,
can end up taking six months. There is no quick way to glance at
corporate data, an immediately detect outdated data models, poor
programming practices, dirty data and unbelievable complexity. If
there is a lack of automation and tools, things get even uglier.
Project after project, I am assured that the organisations data
is 'bang on'; 'no problems with our data'; 'been managing our data
for 20 years and never found a problem'. Not being a data modeler,
architect, programmer or anything else technical, I operate from
the business end, and it doesn't take me long to figure out that
something just doesn't add up. The real problem often starts with
the next step - getting the business to believe that the data is
incorrect - and how imperative it is going forward to take the time
now to fix it.
With little support from the business for delay in the project
to clean and validate data, data programmers are forced into quick
fix solutions....that cause havoc for years to come.
Case Study
A typcial scenario might be - several years after implementation
of a new enterprise system, a business user notices that a certain
record is flagged, when the data does not correspond to the logic
for which the flag is programmed to display.
Let's return to the time just before this application went into
production - tension is high, the deadline in only 24 hours away,
the pressure for delivery on time is intense........and then, a
bug is found!!.
To solve the bug in time for the release, a new field is required.
Programmers identify a field which can be used "just for this
special case" and only temporarily until a real fix can be
implemented [which rarely is]. The bug is 'fixed' and the contractors
proudly deliver the new system in record time. The next two years
are spent discovering and fixing all the shortcuts taken.
Typcial programming shortcuts and quick fixes lead to overloaded
fields, corrupted models, undocumented features and convoluted usage
patterns. Flags appear in the system that can not be traced back
to any 'real' source problem.
Could it be that some data was just dirty when the flag did not
get set or changed correctly or was manually set by an end user
without going through the application logic.
And just to complicate the audit path taken in attempt to resolve
the issue, it is noted that three months earlier, 300,000 records
were loaded into the application database [from a company that was
acquired] that did not comply with this flag rule at all.
Fortunately, the problem of understanding and mapping the data
is finally appearing at the forefront of businesses, largely driven
by new compliance requirements and business needs. Many large enterprises
are embarking on data governance programs, establishing data governance
councils and appointing data stewards to provide a single point
of decision-making and responsibility.
Data Quality Tools
The pain of discovering that the business rules governing a single
field in a single application, grows expedentially when a project
uncovers that the data rules and lineage across hundreds a of applications
and millions of fields!
Fortunately, today there are tools for data relationship discovery
and management to deal with exactly this problem.
Data Discovery Tools focus on data analysis rather
than metadata, and help discover the patterns and rules hidden in
data, then using reverse-engineering, identify the various rules
and exceptions.
Collaboration Tools such as wikis which provide
a searchable, editable forum are ideal places to capture the collective
knowledge of the data in the enterprise and allow analysts from
different groups to collaborate on defining business rules and business
terms.
Validation and Remediation Tools - to validate
data consistency and manage remediation of data inconsistencies
that exist between distributed systems.
While these tools are not totally automatic, they do at least
discover the meaning of things such as flags in days rather than
months.
There is now no rest until organisations have consistent data models,
documented data rules, clean data ... and confident decisions!
NEXT: Operational
Data Stores
Back To Top
Data Index | Data Governance
| Data Quality | MDM
| ODS | Data
Warehouse
|