banner

You need profiling? We are profiling.

Data profiling can be quite rewarding when it results in more efficient data prep.  Our profiling is designed to give you quick insight into data quality issues as well as cleansing opportunities that can result in higher quality and quantity of matches. We put a lot of thought into this feature so you can use it with little or no training although we always invite feedback. This helps you to quickly and easily understand the content and the structure of the data set. And all of these statistics are included in a report for you!

Data Quality

We score your data based on 7 metrics as well as we focus on 4 of the dimensions of data quality: Accuracy, Uniqueness, Conformity and Precision.

Accuracy

  1. Pattern Detection – Regular Expression (RegEx) detection that will inform you if the pattern has been detected (Valid Data) or if it has not detected (Invalid Data) in a column of data.
  2. Counts – Give you an idea of the completeness of your data as well as if the expected max length is within the range that it should be.
  3. Characters – Different columns should contain different data.  A phone number column should not contain letters and a state code should not contain numbers.  Punctuation should be normalized.

Uniqueness

  1. Distinct rows can tell you quickly how much duplication you have in a column.
  2. Histograms also make it easy to see the repetitive values contained in a column.

Conformity

  1. Does the data set and all of the data contained in each of the fields or columns, match requirements? How many different ways do dates and phone numbers appear? This module looks at the syntax and field types.
  2. Valid and invalid values can be easily flagged, visualized and corrected as needed.

Precision

  1. Statistical profiling for numeric values.
  2. Easily identify the minimum, maximum, mean, median, mode, and extreme values in each column.