A soft 404 is a page that looks broken to Google but returns a 200 status code. It wastes crawl budget, blocks indexing, and confuses users. This guide walks you through detection using server logs, content quality triage, and the redirect versus 200 decision.
A soft 404 is a page that returns a 200 HTTP status code but delivers thin, empty, or error-like content. Google sees a 'successful' response, then realizes the page has no real value. It marks the URL as a soft 404, stops indexing it, and eventually deindexes it. This is fundamentally worse than a hard 404 because you lose crawl budget and get no indexing signal. A common situation we see is an e-commerce site with 10,000 product pages that have been discontinued: the CMS returns a 200 with a generic 'This product is no longer available' message. Google treats all 10,000 as soft 404s, wasting 80% of the crawl budget. The fix requires analyzing server logs, assessing content quality, and deciding whether to redirect, 410, or rebuild the page.
Google Search Console will flag soft 404s, but it only shows a sample. For a complete picture, you must parse server access logs. Filter for URLs that returned 200 but had a response body under 150 bytes, or pages with a single line of text like 'Not found.' Use a log analyzer (e.g., Screaming Frog Log File Analyzer) to extract these. Cross-reference with index status checker to confirm which URLs are actually indexed. A real edge case: a client had a custom 404 handler that returned 200 with a 1KB page containing a redirect meta tag. Google ignored the meta tag and soft-404ed the page. Logs showed 200, but the page was never indexed. You need to look at the actual response body, not just the status code.
| Page Type | Content Status | Recommended Action | Implementation Detail | Risk / Failure Mode |
|---|---|---|---|---|
| Product page (discontinued) | Zero inventory, no substitute | 410 Gone | Return 410, remove from sitemap, allow 404 for old links | If you redirect to homepage, Google sees a soft 404 redirect chain |
| Category page (empty results) | No products match filters | 200 with helpful content | Show 'No results' but add curated alternatives or search suggestions | Thin content (under 200 words) still triggers soft 404 |
| Article page (deleted) | Content removed, no redirect | 301 redirect to related article or topic hub | Map old URL to semantically similar page; avoid redirecting to homepage | Redirecting all to homepage creates redirect chains and dilutes authority |
| Thin affiliate page | Low word count, no unique value | 200 with enriched content or noindex | Add 500+ words of original analysis, comparison table, user reviews | Publishing thin pages with noindex still wastes crawl budget |
| Event page (past date) | Event ended, no content | 301 redirect to archive or future events | Use dynamic redirect based on event date; update sitemap | Hardcoded redirects break if event is rescheduled |
| Dynamic filter URL (no results) | Filter combination yields zero products | 410 or redirect to parent category | Implement logic to return 410 for combos with no results; block in robots.txt | Google may crawl infinite filter combinations; set crawl rate limits |
Export GSC soft 404 report. Parse server logs for 200 responses with body < 150 bytes.
Use index checker to see current status. Check if URL has backlinks or traffic.
Review page content. Is it helpful? Minimum 200 words of unique text? No duplicate meta?
If content is salvageable, enrich and keep 200. If page is useless, 410. If related content exists, 301.
Update server config or CMS. Resubmit sitemap. Monitor GSC for 3 weeks for reindexing.
A SaaS client had 500 blog posts that were unpublished but still returning 200 with a single sentence: 'This content has been removed.' Google Search Console showed 498 soft 404 errors. Crawl budget was 15,000 pages/day; 40% went to these dead URLs.
Step 1: Exported all soft 404 URLs from GSC and cross-referenced with server logs. Filtered for URLs with response body under 200 bytes.
Step 2: Checked each URL using index coverage report to confirm which were already deindexed.
Step 3: 120 posts had good backlinks (total 45 referring domains). Those got 301 redirects to the nearest relevant article. 380 posts had zero backlinks and zero traffic. Those got a 410 status code.
Step 4: Updated the CMS to return 410 for the 380 URLs. Implemented 301 redirects for the 120 URLs using a redirect map.
Result: Within 4 weeks, soft 404s dropped to 2. Crawl budget waste reduced from 6,000 to 150 URLs/day. Indexing of new content improved by 30% because Google now crawls fresh URLs.
Export soft 404 report from Google Search Console (Coverage > Soft 404)
Parse server access logs for 200 responses with body size under 150 bytes
Check each URL with a live index checker to see if it is still indexed
Review page content: is there any unique, helpful text beyond 200 words?
Check for duplicate meta titles, missing H1, or generic 'page not found' text
Identify URLs with external backlinks: these are candidates for 301 redirects
For URLs without backlinks or traffic: implement 410 status code
Update sitemap to exclude removed URLs; resubmit to GSC
Not all soft 404s are obvious. Some real operational failures we have seen:
1. JavaScript-rendered empty states: A React app returned 200 with a blank div. Google rendered it and found zero text. Soft 404. Fix: ensure server-side rendering or dynamic rendering returns meaningful content for empty states.
2. Pagination with no results: A site had pagination URLs like /page/5 that Google discovered via internal links, but the category had only 4 pages. Page 5 returned 200 with 'No more products.' This created hundreds of soft 404s. Fix: return 404 for pagination beyond the last page.
3. Mobile vs. desktop discrepancies: A mobile page returned 200 with content, but the desktop version returned 200 with a redirect meta tag. Googlebot desktop saw a soft 404. Fix: consistent response across user agents.
For large sites with crawl budget issues, use this crawl budget waste guide to prioritize which soft 404s to fix first.
Start with server logs. Filter for 200 responses with body under 150 bytes. For products with backlinks, set 301 redirects to relevant alternatives. For products without backlinks or traffic, return 410 status code. Remove these URLs from your sitemap. Use GSC to monitor reindexing. Expect 3-6 weeks for full recovery.
Yes, the GSC API allows you to query index coverage for up to 25,000 rows per property. Use the 'soft404' issue type filter. Download results as CSV and cross-reference with server logs. For agencies, automate this with a script that flags URLs with low response body size. Set up weekly checks to catch new soft 404s early.
A hard 404 (status code 404 or 410) tells Google to stop crawling that URL quickly, conserving budget. A soft 404 returns 200, so Google continues crawling, wastes resources, and may discover more thin pages. Hard 404s are efficient; soft 404s are a crawl budget sink. Always prefer a real 410 over a soft 404.
Use grep or awk to filter access logs. Run: grep ' 200 ' access.log | awk '$10 < 150' to find 200 responses with body size under 150 bytes. Then check those URLs manually. This is a rough filter but catches 80% of soft 404s. Combine with GSC data for higher accuracy. Free log parsers like GoAccess can visualize the data.
Never redirect to the homepage unless the soft 404 page has zero context. Always redirect to a closely related page (e.g., discontinued product to a similar product, deleted article to a category hub). Homepage redirects dilute link equity and confuse Google. If no related page exists, use 410. For pages with backlinks, 301 to the most relevant topic.
Yes. If Google renders your page and sees no visible text, it flags it as a soft 404. Fix by either server-side rendering a message like 'No results found' with suggestions, or implementing dynamic rendering to serve static content to Googlebot. Ensure the response body contains at least 200 characters of relevant text.
Typically 2-6 weeks. Google recrawls the URL, sees the new 200 content (or 410), and updates the index. You can speed this up by submitting the URL via GSC URL Inspection tool and requesting indexing. For large batches, wait for the regular crawl cycle. Monitor the Coverage report for changes.
Indirectly. If Google finds two pages with nearly identical thin content, it may pick one as canonical and treat the other as a soft 404, especially if the non-canonical version has no added value. Fix by consolidating duplicate pages via 301 redirects or adding unique content to each page. Use canonical tags correctly.
Screaming Frog SEO Spider can crawl and flag pages with low word count (under 100 words) and 200 status. Combine with a log file analyzer for server-side detection. For ongoing monitoring, use a tool like Sitebulb or DeepCrawl. Check <a href="https://checkurlindexstatus8.vercel.app">index status</a> periodically to confirm fixes.
Use 410 Gone. It tells Google the page is permanently removed and to stop crawling. 301 redirect is overkill for pages with no traffic or backlinks and can create redirect chains. 410 is cleaner and preserves crawl budget. Only use 301 if the page has valuable inbound links that you want to pass to another page.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.