Stop guessing why Google won't index your page. This checklist forces you to check every common failure point: robots.txt blocking, noindex directives, canonical misdirection, server flakiness, and crawl budget waste. Work through it in order, and you will either fix the issue or know exactly why the page is stuck.
Ninety percent of 'URL not indexed' problems fall into one of three buckets: blocked access, explicit exclusion, or invisible to the crawler. The remaining ten percent are subtle, like a canonical pointing to a different domain or a server that flakes only for Googlebot. This checklist is opinionated. You do not start with the sitemap. You do not start with content quality. You start with the Google Crawling and Indexing documentation as your technical reference, then you eliminate access and exclusion problems first.
In practice, when you open Google Search Console and see 'URL is not on Google', most people jump to 'crawl request' and wait. That is a mistake. You need to know why the URL is excluded before you ask Google to recrawl. Otherwise you are just resubmitting a page that will be ignored again. The checklist below forces the right order.
1. Check the URL Inspection Tool. Copy the exact verdict. Is it 'Submitted URL marked noindex' or 'Blocked by robots.txt' or 'Crawled - currently not indexed'?
2. Open robots.txt. Does it contain a rule that disallows the URL path? Use the live test in GSC, not a cached version.
3. View the page source. Is there a meta robots tag with content='noindex' or content='none'? Check the HTTP response header for X-Robots-Tag: noindex.
4. Inspect the canonical link element. Does the rel='canonical' point to a different URL? Google will index the canonical, not this URL.
5. Verify the HTTP status. Is the page returning 200, 301, 302, 404, or 5xx? Use curl -I with a Googlebot user-agent.
6. Check for server errors specific to Googlebot. Does the server rate-limit or block suspicious IPs? Look at access logs for 429 or 503 responses.
7. Review the sitemap. Is the URL present? Is the <lastmod> date recent? Are there conflicting entries with different URLs?
8. Evaluate crawl budget. For large sites, does this URL have enough internal links? Is it buried 5+ clicks from the homepage?
Enter URL. Read the exact status. Do not guess.
Yes? Fix the disallow rule. Test with live robots.txt tester.
Yes? Remove the tag or change to index. Check header and meta.
Yes? Point canonical to the correct URL or self-referencing canonical.
Fix the status code. Ensure 200. Eliminate chains.
Improve content quality, internal links, or crawl budget.
Scenario: An e-commerce product page with 500 words, 3 images, and no internal links from the category page. The URL is in the sitemap and returns 200. The URL Inspection Tool shows 'Crawled - currently not indexed'.
Diagnostic steps applied:
1. Robots.txt? No blocks. 2. Noindex? None. 3. Canonical? Self-referencing. 4. Server logs? 200 for Googlebot, no errors. 5. Internal links? Zero links from any other page on the site. 6. Content quality? Thin: 500 words, no unique value, same description as 10 other products. 7. Crawl budget? The site has 50,000 products, crawl budget is limited.
Action taken: Added the product to the category page navigation (1 link). Enriched the content to 1,200 words with original photos and a comparison table. Reduced total product count by removing 5,000 duplicate items. After re-submission, the URL was indexed within 3 days.
Numbers: From 0 internal links to 1, from 500 words to 1,200, from 50,000 products to 45,000. The change in indexation rate was immediate.
| Tool / Check | How to Use | Expected Outcome | Hidden Risk / Failure Mode |
|---|---|---|---|
| URL Inspection Tool | Enter URL in GSC. Read the 'coverage' status and 'discovery' info. | Clear verdict: 'URL is on Google' or specific exclusion reason. | Sometimes shows 'Crawled - currently not indexed' even when the page is low quality. The tool does not tell you why it chose not to index. You must infer. |
| Live robots.txt test | In GSC, go to robots.txt tester. Paste the URL and test. | Shows 'Allowed' or 'Blocked' with the specific rule. | Cached version may be outdated. Always run the live test. Also check for X-Robots-Tag in headers, which is not visible in the robots.txt file. |
| cURL with Googlebot user-agent | Run: curl -I -A 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' [URL] | HTTP status code and response headers, including any X-Robots-Tag. | Many servers treat Googlebot differently. You might get a 200 for Chrome but a 503 for Googlebot due to rate limiting or bot detection. |
| Sitemap audit | Check if the URL is in the sitemap. Check | URL present with recent timestamp. | Duplicate URLs in the sitemap (http vs https, with/without trailing slash) confuse Google. Also, a sitemap with 50,000 URLs but low crawl budget means many pages never get fetched. |
| Internal link analysis | Use a crawler (Screaming Frog, Sitebulb) or grep logs to find all internal links to the target URL. | At least 1-2 internal links from pages that are themselves indexed. | Links from noindexed pages do not count. Links from pages with very low PageRank (like a footer link on every page) have minimal effect. Focus on contextual links from content pages. |
| Content quality check | Compare the page to the top 5 results for the target keyword. Measure word count, uniqueness, and entity coverage. | Page should have unique value beyond what is already indexed. | Auto-generated content, thin affiliate pages, and pages with zero original research are systematically ignored by Google, even if technically perfect. |
Blocked by robots.txt but no disallow rule visible. A common situation we see: a site uses a wildcard disallow in robots.txt like Disallow: /*? which blocks all parameterized URLs. Your clean URL might still be blocked if it contains a question mark. Another edge: the robots.txt file itself returns a 500 error. Google assumes everything is disallowed if robots.txt is not accessible.
Wrong filters in the sitemap. You generated a sitemap using a plugin, but it included only posts with a specific meta key. If the meta key is missing, the URL is silently excluded from the sitemap. We have seen entire sections of a site vanish from indexation because of a filter setting that was too restrictive.
Duplicate lists and empty results. If your site has faceted navigation, Google may see 10,000 filtered URLs that all lead to the same 10 products. Those filtered URLs will be 'discovered - currently not indexed' because they are duplicates. You need to block them with robots.txt or a canonical.
Slow vendors. If you use a CDN or server that occasionally returns 503 for Googlebot, the URL may be repeatedly crawled but never indexed because Google cannot reliably access the content. Check your server logs for googlebot and look for any 5xx responses.
First, check if the URL is in your sitemap. Then verify robots.txt does not block the path. Ensure no noindex tag is present. If the post is thin (< 300 words) or duplicates existing content, Google may choose not to index it. Add at least 1 internal link from an indexed page and request indexing via GSC.
It means Googlebot has fetched the page but decided not to add it to the index. Common reasons: low content quality, duplicate content, or the page is considered thin. It can also happen if the page is part of a large site with limited crawl budget. Fix by improving content uniqueness and adding internal links.
Use the robots.txt tester in Google Search Console. Enter the URL and click Test. If it says Blocked, look at which rule is matching. You can also use curl with a Googlebot user-agent: curl -I -A 'Googlebot' https://example.com/page. If you get a 200, robots.txt is not blocking access.
Yes. If the rel='canonical' points to a different URL, Google may index the canonical instead of the original URL. Even if you request indexing, Google will follow the canonical hint. Ensure the canonical tag is self-referencing or points to the preferred URL. Check both the HTML source and HTTP headers.
Check your server access logs for requests from Googlebot IPs. Look for 5xx, 429, or 503 responses. Use the URL Inspection Tool in GSC to see the last crawl date and any errors. If you use a CDN, verify it is not blocking Googlebot. Temporary server errors can cause repeated crawl failures.
The fastest reliable method: 1) Ensure the page is technically clean (no robots.txt block, no noindex, correct canonical, 200 status). 2) Get at least 1 internal link from a page that is already indexed and receives traffic. 3) Submit the URL via GSC URL Inspection Tool. Indexation can happen within hours if the page has high quality and internal link equity.
For large sites (> 10,000 pages), crawl budget is limited. Prioritize indexation by adding internal links from high-authority pages, removing low-value URLs from the sitemap, and using robots.txt to block thin pages. Use the <a href="https://googlenotindexingsiteb.vercel.app/fix-crawl-budget-waste-large-sites">crawl budget waste fix workflow</a> to identify and remove wasted crawl paths.
Open Google Search Console and go to Indexing > Pages. Use the search filter to find your URL. The report shows the coverage status (Valid, Excluded, Error). You can also use the <a href="https://checkurlindexstatus8.vercel.app">URL index status checker</a> for a quick check without logging into GSC.
A URL in the sitemap may still not be indexed if: the page is blocked by robots.txt, has a noindex tag, has a canonical pointing elsewhere, returns a server error for Googlebot, or is considered low quality. Also check if the sitemap itself has errors or if the URL was added too recently (new pages often wait). Use the <a href="https://googleindexcheckerc.vercel.app/index-coverage-report">index coverage report tool</a> to get a detailed breakdown.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.