sScrub.cx
← Back to Blog

How to Deduplicate Contact Lists Without Losing Data

You have three spreadsheets from three different events. You know there is overlap, but you are not sure how much. The last thing you want to do is accidentally delete a record that has a phone number your other copies do not.

Deduplication sounds simple. In practice, it is one of the trickiest data operations to get right.

The naive approach (and why it fails)

The simplest method is to sort by email and remove exact duplicates. This works if your data is perfectly consistent. It almost never is.

Real-world problems include:

  • john@company.com vs j.smith@company.com — same person, different email
  • John Smith at Acme Corp vs Jonathan Smith at ACME — same person, different formatting
  • One record has a phone number, the other has a LinkedIn URL — you need both

If you just delete duplicates based on one field, you lose the unique data from the other records.

A better approach: merge, do not delete

The key principle is to merge duplicate records instead of deleting them. When two records represent the same person, combine the fields so you keep the most complete version.

Here is a practical workflow:

  1. Normalize first — standardize phone formats, trim whitespace, fix capitalization
  2. Match on multiple fields — use email plus name plus company for higher confidence
  3. Merge intelligently — keep the most recent email, the most complete phone number, all tags from both records
  4. Flag uncertain matches — when the system is not sure, let a human decide

Cross-file deduplication

Things get more interesting when you are comparing contacts across multiple files. An event list from January might overlap with a partner list from March and a CRM export from today.

This is exactly what Scrub.cx is built for. Upload all your files, and the system finds duplicates across every list. You see exactly who appears in multiple files and can export a single clean master list.

Preserving data integrity

The golden rule: never throw away data during deduplication. Merge records, flag conflicts, and keep an audit trail. Your future self will thank you when someone asks where a contact came from.

Start with a small test. Take two overlapping lists, run them through a deduplication tool, and compare the results with what you would get manually. The time savings add up fast.