Do we trust AI to clean data? Maybe…? Sometimes…? It depends…? True data cleansing basically means doing a lot of really granular work across entire files or entire databases, which means across some, or potentially all rows, and potentially across all columns. 50,000 rows of data and 25 columns may sound like a pretty ‘small’ […]

Complete Guide to Fuzzy/Probabilistic Data Matching and Entity Resolution Introduction Fuzzy or probabilistic data matching and entity resolution are fundamental processes in data management and analytics. They involve identifying and linking records that refer to the same entity but may have variations due to errors, abbreviations, or inconsistencies. This comprehensive guide delves into the various […]

Here’s a pretty typical scenario that makes end users feel like they’re wasting time, and also creates a ton of waste and kills targeted business outcomes: Some relational information is added to your business systems and no one notices the relationships. This could be different people in the same household or at the same company, […]

Data matching means different things to different people. To people in the financial world it’s often joining or matching, ‘mismatching’ data, describing general financial records like payroll, purchases, expenses, revenue, payments, and P&L. To people in supply chain operations it might be matching ‘mismatching’ data describing supplier details, items purchased, purchase orders, invoices, and payment […]

Want cleaner data? Start by asking for the data up-front. Everybody wants cleaner data but what does it mean to have cleaner data? The best way to have that conversation is with examples of the inputs and the desired outputs. A lot of people ask for cleaner data without really knowing what they really need. […]

Data quality can also spell scope creep so you better spell out your requirements. Data management, master data management and systems integration, however, are much more critical priorities, ensuring a minimum level of ‘data quality’ and interoperability. It’s also arguably much easier to achieve. It’s nearly impossible to guarantee that our data will always be […]

Python users probably know data matching by the name “string matching”. Excel users probably use the “v lookup” and the “fuzzy lookup” functions. Business people will just tell you that there are too many duplicates and that they can’t find the same data in other business systems. Almost every system has their own version of […]

The same customer has been coming into your local stores for years, but each time they’re visiting different locations, using different phone numbers and different forms of identification, and paying with different forms of payment. Just recently they’ve started purchasing items from your online store under the 8th customer account in your systems. Sometimes they […]

When mergers and acquisitions happen there’s usually a rush to standardize, synchronize and merge at least some of the data (information), systems, processes and resources. This is in an effort to capitalize on strategic synergies aiming to achieve aggressive ROI targets, which requires quick action and implementation. The data is usually the first priority because […]

It’s probably pretty clear by now that people and company names don’t match, and that mismatching data makes it very difficult to find duplicates (or to link/sync/relate the same records within and across different data sources). This is really the core of master data management but it’s also data quality, and it’s part of everyday […]