Blog Articles

Posted on May 31, 2024

Using AI to cleanse data?

Do we trust AI to clean data? Maybe…? Sometimes…? It depends…? True data cleansing basically means doing a lot of really granular work across entire files or entire databases, which means across some, or potentially all rows, and potentially across all columns. 50,000 rows of data and 25 columns may sound like a pretty ‘small’ […]

Posted on April 25, 2024

Complete Guide to Fuzzy/Probabilistic Data Matching and Entity Resolution

Complete Guide to Fuzzy/Probabilistic Data Matching and Entity Resolution Introduction Fuzzy or probabilistic data matching and entity resolution are fundamental processes in data management and analytics. They involve identifying and linking records that refer to the same entity but may have variations due to errors, abbreviations, or inconsistencies. This comprehensive guide delves into the various […]

Posted on April 4, 2024

Duplicate and Fragmented Relational Data

Here’s a pretty typical scenario that makes end users feel like they’re wasting time, and also creates a ton of waste and kills targeted business outcomes: Some relational information is added to your business systems and no one notices the relationships. This could be different people in the same household or at the same company, […]

Posted on March 9, 2024

Data Matching in Different Areas of Business

Data matching means different things to different people. To people in the financial world it’s often joining or matching, ‘mismatching’ data, describing general financial records like payroll, purchases, expenses, revenue, payments, and P&L. To people in supply chain operations it might be matching ‘mismatching’ data describing supplier details, items purchased, purchase orders, invoices, and payment […]

Posted on March 7, 2024

A Path to Cleaner Data

Want cleaner data? Start by asking for the data up-front. Everybody wants cleaner data but what does it mean to have cleaner data? The best way to have that conversation is with examples of the inputs and the desired outputs. A lot of people ask for cleaner data without really knowing what they really need. […]

Posted on February 28, 2024

Are Data Matching, Entity Resolution, and Systems Integration, better minimum requirements for data quality?

Data quality can also spell scope creep so you better spell out your requirements. Data management, master data management and systems integration, however, are much more critical priorities, ensuring a minimum level of ‘data quality’ and interoperability. It’s also arguably much easier to achieve. It’s nearly impossible to guarantee that our data will always be […]

Posted on February 26, 2024

Industrial Strength Data Match and Entity Resolution Systems

Python users probably know data matching by the name “string matching”. Excel users probably use the “v lookup” and the “fuzzy lookup” functions. Business people will just tell you that there are too many duplicates and that they can’t find the same data in other business systems. Almost every system has their own version of […]

Posted on February 23, 2024

Same Customer, Different Customer Profiles

The same customer has been coming into your local stores for years, but each time they’re visiting different locations, using different phone numbers and different forms of identification, and paying with different forms of payment. Just recently they’ve started purchasing items from your online store under the 8th customer account in your systems. Sometimes they […]

Posted on February 22, 2024

Simplifying Big Data Management Technology to Scale the Most Common Data Requirements

When mergers and acquisitions happen there’s usually a rush to standardize, synchronize and merge at least some of the data (information), systems, processes and resources. This is in an effort to capitalize on strategic synergies aiming to achieve aggressive ROI targets, which requires quick action and implementation. The data is usually the first priority because […]

Posted on February 16, 2024

Fuzzy Matching and Entity Resolution

It’s probably pretty clear by now that people and company names don’t match, and that mismatching data makes it very difficult to find duplicates (or to link/sync/relate the same records within and across different data sources). This is really the core of master data management but it’s also data quality, and it’s part of everyday […]