Skip to main content

Crawlability, indexation and ranking are often confused, and while they relate very closely, they mean very different things. Before ranking can happen, crawling and indexing has to occur, and that’s why technical SEO is so important.

Crawlability

Crawlability is at the core of SEO. When a search engine crawler like Googlebot or Bingbot accesses a website, they crawl it to find all the pages, images, links, CSS and Javascript files, etc. By definition, crawlability is the ability of search engines to find and access website content.

If a search engine can’t access website content, then neither indexation nor ranking are possible. Those things rely on search engines being able to crawl the content in the first place.

Changing Crawlability with Robots.txt

Robots.txt is the most common method to prevent search engines from crawling a website. If a specific URL or subfolder is blocked in robots.txt, search engines will not crawl it. That means links on that page won’t be discovered, and entire site sections might not be discovered (that’s why it’s important to have an XML and/or HTML sitemap – redundancy!)

In this example, robots.txt is being used to block search engines from crawling pages that fall in the /properties/listing/ subfolder on a real estate website.

User-agent: *
Disallow: /properties/listing/

Blocking Crawlability

Sometimes content is prevented from being crawled inadvertently. If content is rendered with Javascript, this can sometimes be problematic, although changes to Googlebot will now allow it to better crawl Javascript than ever before.

Additionally, if links are using a nofollow directive, that may present issues as the nofollow attribute instructs Googlebot NOT to follow links. However, there’s debate about how Google really handles nofollow links, with some claiming that Google does crawl them, but doesn’t give them any link equity.

Indexation

Once content has been crawled, it’s up to search engines to decide whether to index that content. If the content is duplicated or plagiarized, then search engines may choose to crawl it, but not index it. Similarly, thin or low quality pages may fall victim to the same fate.

Preventing Indexation

You may choose to intentionally prevent search engines from indexing content using a noindex directive. However, search engines still need to crawl that page in order to see the noindex tag. If you want to noindex a page, you have to let it be crawled first. Sometimes a noindexed page will be stuck in the index because it’s blocked by robots.txt and therefore Google cannot see the noindex directive after it’s been added.

Common types of pages that might be noindexed include tag & category pages (common on WordPress). These pages are valuable in that they automate creation of internal links, but they aren’t great for search since they have primarily dynamic content and aren’t optimized well for organic search. I typically recommend noindexing tag & category pages.

Ranking

Ranking is the last step in the process. A page can get crawled and indexed, but not rank well at all. In a competitive space like retail and eCommerce, there are hundreds of websites trying to rank for the same keywords and only 10 will end up with page one visibility.

Improving rankings relies very heavily on A) the on-page content and B) overall site authority and off-site (linking) efforts. Ranking is usually the most difficult part – assuming there aren’t glaring technical issues on the site, crawling and indexation are a lot easier.

Summary

To summarize: rankings rely on crawling and indexation, in that order: Crawling > Indexing > Ranking.

Want more SEO and Analytics content?

Subscribe to my email list

Chris Berkley

Chris is a digital marketing consultant specializing in SEO and Analytics across industries including healthcare, education, finance and others.

Leave a Reply