Duplicate content you never wrote is ranking against you
Duplicate content rarely arrives intentionally. A product import ran twice. A blog post was duplicated for editing and the original was never deleted. A WooCommerce variation ended up with the same description as the parent. Google sees two URLs with identical content and has to pick one. It often picks neither and buries both.
You can't fix what you can't find. A 2,000-post catalog has no obvious "find duplicates" button in the WordPress admin. You either pay for a crawl tool that checks the rendered page, which misses draft and private posts, or you query the database directly with SQL that does not normalize whitespace, shortcodes, or block markup before comparing.
What most people do
wp_posts A direct string comparison misses duplicates where one post has a trailing space, a different block comment, or a shortcode the other lacks. You get false negatives and false positives in the same run.A better way: hash-based comparison across every post type
TrueCommander's find duplicate content command strips HTML tags, Gutenberg block markers, shortcodes, and extra whitespace before comparing. It normalizes to lowercase and hashes the result, so two posts are only flagged as duplicates when their actual readable content is identical, not just similar.
This command is completely read-only. It scans and reports. It never deletes, redirects, or modifies any post. Use the results to decide which posts to merge, redirect, or remove, then act with other commands or the editor.
How it works
-min_length characters (default 100) is skipped in content mode.-by=both requires both the title hash and the content hash to match before two posts are grouped.find and replace, redirect, or another command to act on them.| Parameter | Details |
|---|---|
-types | CSV of post types to scan, e.g. post,page,product. Revisions, attachments, and WooCommerce variations are always excluded even if listed. |
-post | Boolean shortcut alias to include the post type. Default true. |
-page | Boolean shortcut alias to include the page type. Default true. |
-product | Boolean shortcut alias to include the product type. Default true. |
-by | content (default): hash normalized body text. title: hash normalized titles only. both: require both title and content to match. |
-post_status | Default publish. Set to publish,draft,private to catch unpublished duplicates too. |
-min_length | Skip posts whose normalized content is shorter than N characters. Default 100. Applies in content and both modes only. |
-limit | Max posts to scan per run. Default 1000, max 10000. |
| Can be used in |
Real example
An agency migrated a client's WooCommerce store from one platform to another. The import script ran twice: once as a test and once for real. Nobody noticed the test run populated the database. Six months later, organic traffic to the product catalog has dropped 18% and Google Search Console shows dozens of "duplicate content" hints in the coverage report.
You run tp find duplicate content -product=true -by=content -post_status=publish,draft. The command comes back with 34 duplicate groups covering 68 products. Every group is a test-import product paired with its live counterpart. You take the list of IDs for the test-run copies, trash them in bulk from the editor, and set up 301 redirects for any that had already been indexed.
Two weeks later the coverage warnings are gone and the 18% traffic drop starts recovering. The whole diagnosis took under a minute.
Goes further with TrueCommander
find duplicate content weekly on autopilot so new duplicates from imports, syncs, or accidental republishing are caught before Google indexes them.tp find and replace -post_ids=... to correct the content in one run.