Why isn't my page indexed on Google?
If you've checked Google Search Console (GSC) and your page is in the Coverage report under 'Excluded' instead of 'Indexed', this guide explains the four most common GSC bucket types in plain language, what causes each one, what WarmySender does on the platform side to prevent them, and a 5-step debug checklist you can run on your own site.
The four GSC bucket types you're most likely to see
1. 'Discovered — currently not indexed' — Google found the URL (usually from your sitemap or an internal link) but decided not to crawl or index it yet. The URL is in Google's frontier; it's just deprioritized. This is the most common bucket on programmatic SEO sites because Google's indexing decision rests on signals that suggest the page is worth crawling — internal-link weight, anchor-text relevance, proximity to high-authority pages, and content uniqueness. Sitemap inclusion alone is a *discovery* signal, not a *quality* signal. A URL that floats in the sitemap with no inbound internal links from the homepage, footer, or sibling content cluster reads to Google as 'low-priority candidate, defer indefinitely.'
2. 'Crawled — currently not indexed' — Google fetched the page, looked at it, and decided not to add it to the index. Common causes: thin or templated content (the page looks like a near-duplicate of others on your site or the wider web), low unique-text ratio (most of the page is boilerplate chrome — header, nav, footer — with very little body content), or the body content extracts to fewer than ~300 words. This bucket is harder to clear than 'Discovered' because Google has already formed an opinion; you need to substantively improve the page (more unique content, better structure, examples, FAQs, internal links to siblings) and then ask Google to re-evaluate.
3. 'Soft 404' — The page returned HTTP 200 (success) but the content reads as 'this page doesn't really exist.' Common causes: a placeholder template that says something like 'No items in this category yet,' an empty `<article>` body that renders only header and footer, a translated page where the translation pipeline left literal placeholder markers ('content would be here', `COMMENT_0`), or a catch-all SPA route that returns the homepage body for any unmatched URL with a self-canonical pointing to '/'. The fix is to return an actual HTTP 404 (or fix the content) — Google specifically guides that empty pages should 404, not 200 + noindex. 200 + noindex on an empty page is the worst of both worlds: the URL stays in Google's frontier and gets re-crawled forever, polluting your Coverage report.
4. 'Excluded by noindex tag' — The page returned HTTP 200 but its `<meta name="robots" content="noindex">` tag (or HTTP `X-Robots-Tag: noindex` header) told Google not to index it. This is correct for pages you genuinely don't want indexed (authenticated app pages, search result pages, paginated tail pages, gated content previews). It's a problem when the noindex was added accidentally — for example, a global SSR template that emits noindex on every fallback render path, or a hreflang-discovered locale URL where the locale doesn't actually have translated content but the SSR template falls through to noindex+200 instead of 404.
Common causes (across all four bucket types)
- Thin content (less than ~300 words of unique body text per page)
- Weak internal-link signals (no inbound links from the homepage, footer, nav, or sibling content cluster)
- Duplicate content (near-identical pages with the same intros, CTAs, or boilerplate paragraphs across many URLs)
- Slow render or render-budget exhaustion (Googlebot times out before it gets your content)
- Broken canonical tags (the URL canonicalizes to a different URL, so Google indexes the canonical instead — sometimes this is correct, sometimes it's an accident)
- Sitemap pollution (the sitemap lists URLs that 404, redirect, or are noindex'd, telling Google your site is poorly maintained)
- Hreflang mismatches (your hreflang alternates point to URLs that 404 or to pages that aren't real translations)
What WarmySender does on the platform side
WarmySender is a programmatic SEO site (we publish hundreds of comparison, glossary, feature, and guide pages) so we hit these patterns at scale. The platform applies several auto-protections to keep crawlable surface clean. Most are built into the SSR templates, the sitemap generator, and the publish pipeline:
- Related-articles auto-injection on every blog post and content cluster (so the long tail isn't orphaned in the link graph)
- Placeholder content detection in the publish pipeline (translation-pipeline artifacts like `COMMENT_0` or 'content would be here' are caught pre-publish AND blocked at SSR-render time)
- Canonical hygiene (each public URL has a self-canonical; legacy paths 301-redirect to the current location; the SSR catch-all returns 404 to crawlers for unmatched URLs instead of serving the homepage body)
- Sitemap DB-pre-flight (every URL in our sitemap is backed by a real published row for that exact slug+locale — no speculative cross-products of locale × slug)
- Localized index pages 404 when empty (rather than 200 + noindex, which pollutes the Coverage report)
- Hreflang per-page discipline (we emit hreflang only for locales that actually have a published translation row for that exact URL)
These are general principles that apply to any programmatic SEO site — feel free to adopt them on your own.
For the canonical reference on Google's bucket types, see the official Search Console help center: https://support.google.com/webmasters/answer/7440203 (Page indexing report).
5-step debug checklist (for users running their own SEO)
If you're seeing pages in 'Discovered – not indexed', 'Crawled – not indexed', 'Soft 404', or 'Excluded by noindex tag' on your own site, run this checklist before assuming it's broken on Google's side.
1. Inspect the URL in GSC. Open Search Console → URL Inspection → paste the URL. Read what Google says about the URL's last-crawled date, the canonical Google chose, the indexed/not-indexed reason, and any errors. This is the highest-fidelity signal you'll get; everything below is supporting evidence.
2. Curl the URL with Googlebot UA. Run `curl -sI -A "Googlebot/2.1 (+http://www.google.com/bot.html)" <URL>` and confirm the HTTP status is 200, the `<meta robots>` tag (if any) doesn't say noindex, the `<link rel="canonical">` points to the URL itself (or the URL you intend Google to index), and there are no redirect chains. Then run `curl -s -A "Googlebot/2.1" <URL> | wc -w` and confirm the body has a real word count (>500 for a content page; >100 for a thin index/category page is a warning sign).
3. Check the link graph. Run `curl -s -A "Googlebot/2.1" <YOUR_HOMEPAGE> | grep -oE 'href="[^"]*"' | grep -c <YOUR_URL_PATH>` to confirm the URL has at least one inbound internal link from the homepage. Repeat for the footer, nav, and any high-authority hub page. If the URL is truly orphaned (zero inbound internal links), it will sit in 'Discovered – not indexed' indefinitely.
4. Check sibling content for near-duplicate boilerplate. If you have many pages with similar structure (comparison pages, glossary terms, listicles), check whether they share long verbatim passages — Google's near-duplicate clusterer drops most pages in a cluster and keeps one. Run a diff between two of your most-similar pages and look for paragraphs that repeat verbatim across both. Rewrite the shared sections to be page-specific.
5. Submit a re-crawl after fixing. In GSC URL Inspection, click 'Request Indexing' once you've made changes. For 'Discovered – not indexed' and 'Crawled – not indexed' specifically, expect 2-12 weeks for the bucket to clear even after fixes — Google indexes on its own schedule and the only way to speed it up is the sustained quality + link-graph signal, not repeated re-crawl requests. Re-submit your sitemap.xml in GSC after major content changes, and use the Coverage report's 'Validate Fix' button on each affected category to ask Google to re-evaluate the cluster.
When to ask support
If your pages on WarmySender (e.g., your authored blog posts, custom domain landing pages, or any feature you publish through the platform) are stuck in any of the four GSC buckets for more than 4 weeks after you've completed the 5-step checklist, email hello@warmysender.com with the affected URLs and your GSC screenshots. We'll dig into the specific render path and confirm whether the issue is on the platform side (something we should fix) or on your content side (where the 5-step checklist is the right path forward).