Tutorials

SEO for Beginners: How Google Actually Finds Your Pages

How Googlebot crawls, indexes, and ranks pages — crawl budget, mobile-first indexing, title tags, Core Web Vitals, and the myths that refuse to die.

Rafael Duarte

EDITOR TÉCNICO

Published

Jun 18, 2026

Reading time

8 min

Jun 18, 2026 · 8 MIN

COVER · Tutorials

You launched the site. It's live. A week later, Google still hasn't found it. You search for your product name and nothing comes up. You open Search Console and see "URL is not on Google." That's not a penalty — it's the normal pipeline. Google doesn't automatically index everything that exists on the internet. There's a process.

How Google discovers new pages

Googlebot is Google's crawler — a program that navigates the web by following links, much like a user would, but at massive scale and in a systematic way. It starts from a set of known URLs (seeds) and discovers new pages through links it finds on each page it visits.

There are three main ways Googlebot reaches a new page:

External link. Someone links to your site from a page Google already knows. This is the most natural path and historically the most reliable.
Sitemap. You submit a sitemap.xml in Google Search Console telling Google which URLs exist on your site. Google doesn't promise to index everything in the sitemap, but submitting one is the most direct way to say "these pages exist."
Manual request. In Search Console, you can request inspection of a specific URL. Useful for new or recently updated pages — but not a guarantee of immediate indexing.

One important detail: since October 2023, Google uses mobile-first indexing by default for all sites. Googlebot crawls and indexes primarily the mobile version of your page. If your site doesn't have a responsive layout, part of your content may be ignored.

What happens after Googlebot visits

A visit is not the same as indexation. Googlebot can pass through a URL and decide not to index it for various reasons: duplicate content, perceived low quality, a noindex directive in the HTML, or simply because the crawler prioritized other URLs at that moment.

The process has three distinct stages:

1. Crawling: Googlebot sends an HTTP request to the URL, downloads the HTML (up to ~2 MB), and then renders the page by executing JavaScript to see the final DOM. Rendering can happen hours or days after the initial crawl — Google uses a separate rendering queue.

2. Indexing: Google processes the rendered content, extracts text, links, and metadata, and stores it in the index. This is where the page starts to exist for Google. Without indexing, there's no ranking — no matter how good the page is.

3. Ranking: among all indexed pages that are candidates for a given search, Google applies hundreds of signals to decide the order. Content, relevance, authority, user experience — all of it goes into the calculation.

What is crawl budget and when does it matter

Crawl budget is the number of URLs Googlebot is willing to crawl on your site within a given period. For small sites (a few hundred pages), this rarely matters — Google crawls everything quickly. For large sites (e-commerces with tens of thousands of SKUs, for example), managing crawl budget is critical.

Two factors determine crawl budget: crawl rate limit (how many requests your server can handle without crashing) and crawl demand (how much Google thinks it's worth crawling your pages based on popularity and freshness).

If you have low-quality pages, duplicate URLs without canonical tags, or redirect chains, those consume crawl budget that could go to the pages that actually matter.

What influences rankings

The question everyone asks — and nobody answers honestly — is: what makes a page rank?

The honest answer is: relevance to search intent, domain authority, content quality, and user experience. All at the same time. There's no single magic factor.

But there are concrete on-page elements you control:

Title tag

This is the title that appears in the search result. Straight to the point: it's the strongest on-page signal you have. Practical limit in 2026: 50–60 characters to avoid truncation in the SERP. Put the primary keyword at the beginning. Every page needs a unique title.

<title>Free QR Code Generator — Quick Tools</title>

Meta description

Not a direct ranking factor — Google has been clear about this. But it's the text that appears below the title in the SERP and has a direct impact on CTR (click-through rate). CTR is a relevance signal. Safe limit: 120–160 characters. If you don't write one, Google will pick a random excerpt from the page — which is rarely the best excerpt.

<meta name="description" content="Create QR codes for URLs, WhatsApp, Wi-Fi, or plain text in seconds. No signup required." />

Headings (H1, H2, H3)

The <h1> should appear once per page and clearly communicate the main topic. <h2> tags organize sections. This isn't just SEO — it's structure for both the reader and the crawler to understand what's most important on the page.

Clean, descriptive URLs

/tools/qr-code-generator ranks better than /page?id=482&cat=tools. The URL appears in the SERP, Google uses it as a content signal, and it's easier to link to.

Core Web Vitals

Since 2021, Google has used performance metrics as a ranking factor. The three that matter in 2026:

LCP (Largest Contentful Paint): time until the largest visible element loads. Target: below 2.5 seconds.
INP (Interaction to Next Paint): response time to user interactions. Target: below 200ms. It replaced FID in 2024.
CLS (Cumulative Layout Shift): how much the layout jumps while loading. Target: below 0.1.

These thresholds are what Google considers "good" in 2026. The ranking impact is real but proportional — Core Web Vitals acts as a tiebreaker between pages with equivalent content, not a substitute for quality content.

What is not a ranking factor (common myths)

There's a lot of misinformation circulating about SEO. Some persistent myths:

"Posting every day improves rankings." Publishing frequency is not a direct factor. One excellent page ranks better than ten mediocre ones. Google prefers useful content over frequent content.

"Meta keywords still work." Google has ignored the keywords meta tag since 2009. Using it doesn't hurt, but it doesn't help either.

"Having SSL guarantees a higher position." HTTPS has been a basic trust requirement since 2018 — sites without HTTPS are penalized, but having HTTPS doesn't put you ahead of anyone. It's the baseline expectation.

"SEO takes 6 months to work." This depends entirely on keyword competitiveness and domain authority. A well-built page targeting a low-competition keyword can rank within days. For competitive keywords, yes, it can take months.

How to check if your pages are indexed

The most direct method: search site:yourdomain.com on Google. The result shows how many pages Google has indexed from your domain. If significantly fewer pages appear than you have published, there's a problem somewhere in the crawling or indexing pipeline.

For a more precise diagnosis, Google Search Console is the authoritative source. It shows which URLs are indexed, which have errors, and why some were excluded from the index.

Before pushing a new page live, I use the Google SERP Preview tool to see how the title and description will appear in results — especially useful to confirm the title isn't cutting off mid-word and the description stays within the character limit.

Frequently asked questions

How long does it take for Google to index a new page?

It varies widely. Pages on established domains with healthy crawl budgets can be indexed within hours. New sites with no external links can take weeks. Submitting the URL in Google Search Console (via "Inspect URL" → "Request indexing") speeds things up, but guarantees nothing.

What is robots.txt and when should I use it?

robots.txt is a file in your domain root that tells Googlebot which URLs it should not crawl. Useful for excluding admin areas, login pages, or parameter-based URLs that generate duplicate content. Important caveat: robots.txt blocks crawling, not indexing. If a blocked URL receives external links, Google may index it anyway — without being able to read the content.

Do I need backlinks to rank?

For competitive keywords, backlinks (links from other domains pointing to yours) remain one of the strongest authority signals. For low-competition keywords, well-structured content is often enough. Domain authority accumulated through links affects the ranking of all pages on your site, not just the ones directly receiving links.

What does the canonical tag do?

When you have duplicate or very similar content at multiple URLs, the canonical tag tells Google which is the "official" version. Example: /product?color=blue and /product?color=red with nearly identical content — you point both to /product via canonical. Without this, Google decides on its own which version to rank — and rarely picks the one you want.

Indexing isn't automatic: what to do now

Most visibility problems on Google aren't about content — they're about crawling. Pages with no links pointing to them, outdated sitemaps, noindex tags that should have been removed after launch, canonical tags pointing to the wrong URL.

The minimum checklist for any new site: verify in Search Console that the main pages are indexed, make sure the sitemap is submitted and up to date, and check that no noindex tag was accidentally left over from the staging environment. After that, on-page SEO is optimization — not the core problem.

Author

Rafael Duarte

Desenvolvedor backend com passagem por fintech e SaaS B2B — trabalhou em times que escalaram APIs de zero a milhões de requisições. Carrega cicatrizes de produção suficientes para ter opiniões fortes sobre ferramentas, padrões e decisões de arquitetura. Não é acadêmico: leu a RFC do UUID quando precisou escolher entre v4 e v7 para uma tabela de alta escrita.

View profile