Understanding the crawlability of a website and the crawl budget

Crawlability is a fundamental aspect of Technical SEO and refers to the ability of search engines to crawl the pages of a website and understand its content. Crawlability is the number of pages that a search engine crawler crawls on a website within a certain period of time. Good crawlability is crucial for ensuring that search engines can find and index all the important pages of a website - and that the available crawl budget is utilized in the best possible way.

In this article, I explain exactly what crawlability and crawl budget mean for your website, why they are important and which factors influence them.

Definition of crawlability

Crawlability describes the ability of search engine crawlers (also known as bots or spiders) to access the content of a website, crawl it and process the information it contains. A search engine crawler is an automated program that searches the Internet to discover and index websites. The best-known crawlers include Googlebot, Bingbot and Yahoo Slurp.

Why is crawlability important?

Only pages that can be crawled by search engines are indexed and can therefore appear in the search results. If important pages cannot be crawled, they lose the opportunity to rank well in search results, resulting in a loss of visibility and traffic.

In this example, a client uploaded sitemaps to Google Search Console and made some mistakes. Only years later, after an SEO audit by us, was the sitemap added in the correct way.

Fehlerhafte Sitemaps

Factors that influence crawlability

An XML sitemap is a file that lists all the important pages of a website and helps search engines find these pages. It is particularly useful for large websites or those with complex structures.
The robots.txt file gives search engines instructions on which areas of the website they may and may not crawl. An incorrectly configured robots.txt file can inadvertently exclude important pages from crawling.
A clear and logical URL structure makes it easier for search engines to understand the relationships between different pages. Short, descriptive URLs are an advantage here.
Well thought-out internal linking helps search engine crawlers to discover all pages of a website and understand their meaning. Pages that are hidden deep within the website structure and have few internal links can be difficult to find.
Slow server response times can mean that search engine crawlers cannot search all the pages of a website. Fast and reliable server performance is therefore important for good crawlability.
Error pages (e.g. 404 errors) and poorly configured redirects can hinder the crawl process. It is important to perform regular checks and ensure that all links on the website are working.
Duplicate content can confuse search engines and cause them to not know which version of a page to index. The use of canonical tags can help to solve this problem.

The Search Console also points out duplicate content without canonical tags. Here, the customer is already dismantling the pages piece by piece after an SEO audit has been carried out by us.

Duplicate Content Probleme ohne Canonical Tags

The crawl budget and its importance for crawlability

The crawl budget is an important term in the field of technical SEO, which refers to the number of pages that a search engine crawler searches on a website within a certain period of time. It is directly related to the crawlability of a website, as efficient management of the crawl budget ensures that search engine crawlers can find and index the most important pages of a website.

The crawl budget is made up of two main components:

Crawl Rate Limit (Crawl Rate Limit): This is the number of requests a search engine crawler can send to a website without impacting server performance. Google automatically adjusts this rate to ensure that the server is not overloaded.
Crawl Demand: This depends on the popularity and topicality of the pages. Frequently updated or particularly relevant pages have a higher crawl demand and are crawled more frequently.

Why is the crawl budget important?

Efficient management of the crawl budget is crucial because search engines crawl a limited number of pages per website within a certain period of time. Especially for large websites or websites with frequent updates, it is important that the most relevant pages are prioritized. An inefficient crawl budget can result in important pages not being crawled or indexed, which has a negative impact on search engine visibility.

The project above already has over 1,200 duplicate pages, which use up the crawl budget unnecessarily. Even worse is the impact of the 404 pages recorded by the Search Console. With this high number, it quickly becomes clear that many irrelevant pages are being crawled and thus the crawl budget is being used very inefficiently. See here:

404-Seiten beeinträchtigen das Crawlbudget

Relationship between crawl budget and crawlability

There are two very practical approaches to improving both crawlability and the crawl budget:

1. optimizing crawlability to maximize the crawl budget by avoiding duplicate content and improving internal linking.

Duplicate content wastes the crawl budget as crawlers crawl the same content multiple times. By using canonical tags and avoiding redundant pages, the crawl budget can be used more efficiently. Well-structured internal linking helps search engine crawlers to find and search the most important pages quickly. This ensures that the crawl budget is not wasted on unimportant or difficult to access pages.
Technical errors such as 404 pages or slow loading times can hinder crawling and make inefficient use of the crawl budget. Regular checks and optimization of website performance are therefore crucial for your success.

2. efficient management of the crawl budget to improve crawlability

XML sitemaps help search engines to find and prioritize the most important pages of a website. This helps to ensure that the crawl budget is used efficiently and that the most important content is crawled. By configuring the robots.txt file correctly, unnecessary pages can be excluded from crawling so that the crawl budget is concentrated on the relevant pages. Regular updates and the merging of similar pages also improve the relevance and topicality of the content, which in turn increases crawl demand and makes more efficient use of the crawl budget.

Perform regular checks to identify and fix crawl obstacles such as 404 errors, slow load times and other technical issues.

Server response times, website performance and their relation to the crawl budget

The server response times and performance of a website have a direct impact on the crawl budget. A slow website can have a negative impact on the crawl budget and thus reduce the efficiency of indexing by search engines.

The server response time is the time it takes a web server to respond to a request from a user or search engine crawler. It is an important indicator for the performance of a website and can influence the crawl rate.

You can find information on your average server response time in the Google Search Console under Settings => Crawl statistics. Here is an example with a very good average server response time:

Gute Serverantwortzeit

What influence do server response times have on the crawl budget?

If a website has slow server response times, search engine crawlers need more time to crawl each page. This can result in fewer pages being crawled within a given time period because your crawl budget is limited.
Google dynamically adjusts the crawl rate to the server response times. If server responses are slow, Google reduces the number of requests so as not to affect server performance. This means that fewer pages are crawled, which makes the crawl budget inefficient.
Prioritization of fast websites: Search engines favor websites with fast load times and fast server response times. A slow website can therefore be given a lower priority when crawling, which has a negative impact on the crawl budget.
The overall performance of a website encompasses several aspects, including loading speed, time to first byte (TTFB), and overall usability. A well-optimized website not only provides a better user experience, but also more efficient crawling.

So what can you do? Sure, improve load times and reduce server load! Websites that load quickly allow search engine crawlers to crawl more pages in less time. This maximizes the crawl budget and ensures that important pages are crawled and indexed. A well-optimized website reduces server load and ensures that search engine crawlers can work efficiently without impacting server performance. This leads to better use of the crawl budget.

A fast and smooth user experience increases user dwell time and reduces the bounce rate. Search engines take these factors into account when assessing the relevance and quality of a website, which can have a positive effect on the ranking.

Google itself says in its help article on the crawl budget: If the website responds very quickly for a while, the limit is increased so that more connections can be used for crawling. If the website slows down or responds with server errors, the limit is reduced and the Googlebot crawls less.

Here is another screenshot from another project. The server response time value is over 800, which is almost unfortunate normality, according to our observations. Many projects barely manage average values below 500.

Durchschnittliche Serverantwortzeit

If you notice a relatively high server response time value, you can take some practical measures to optimize server response times and website performance:

A CDN distributes the load of content delivery across multiple servers worldwide, which reduces loading times and improves server response times. This is particularly useful for multilingual websites with international visitors.
Compress and optimize your images and other media content to reduce loading times. In particular, use modern image formats for the web such as AVIF or WebP.
Implement browser caching and server-side caching to reduce repeated requests and increase loading speed.
Reduce the number of HTTP requests by merging CSS and JavaScript files and removing unnecessary plugins.
Minimize the use of third-party scripts that can negatively impact website load time.
Use tools such as Google PageSpeed Insights, Lighthouse and WebPageTest to regularly monitor the performance of your website and identify optimization potential.
Make sure that your server has sufficient resources and is regularly maintained. Use modern web server technologies such as NGINX or HTTP/2.

The last point in particular solved the bottleneck in one of our customer projects. In June, the problems with Core Web Vitals were solved in this project and at the beginning of July the server was changed with a proper upgrade to a modern setup and good performance. All of a sudden things started to pick up.

Core Web Vitals Probleme gelöst und Serverwechsel

Despite all our efforts, some sites can have difficulties with crawlability. Here are some common problems and possible solutions:

Faulty robots.txt file

Problem: An incorrectly configured robots.txt file can prevent search engine crawlers from crawling important pages.
Solution: Check the robots.txt file to ensure that no relevant pages are inadvertently excluded. Use the Robots.txt tester tool from Google.

Missing or incomplete XML sitemaps

Problem: Without one or with a faulty XML sitemap, it can be difficult for search engines to discover all pages of a website. See example screenshot above.
Solution: Create and submit a complete XML sitemap to the Google Search Console. And make sure it is verified as correct. Update this regularly.

Deeply nested pages

Problem: Pages that are many clicks away from the homepage may not be crawled. See also screenshot from above from Audisto. Most of the pages were at level 6 to 8.
Solution: Optimize internal linking to ensure that all important pages are reachable in a few clicks.

Duplicate content

Problem: Duplicate content can make it difficult for search engines to identify the most relevant version of a page.
Solution: Use canonical tags to identify the main version of a page and avoid duplicate content.

Missing or incorrect redirects

Problem: Broken links and misdirected redirects can hinder the crawl process.
Solution: Use 301 redirects for permanently moved content and avoid 302 redirects for permanent changes. Check regularly for broken links.

Excessive parameters in URLs

Problem: URLs with many parameters can be difficult for search engines to crawl. This is often the case with store pages, for example, which have variants (in weight, size, color, etc. via parameters) for products
Solution: Only use URL parameters if they are absolutely necessary and structure URLs as simply and readably as possible.

Server problems

Problem: Server errors such as 5xx errors can result in search engine crawlers not being able to reach the pages.
Solution: Monitor server performance and fix any errors immediately. Ensure that your server has high availability.

Conclusion on crawlability and crawl budget

The crawl budget is a factor that you should be aware of for the efficient indexing and visibility of a website in search engines. By optimizing crawlability and efficiently managing the crawl budget, you as a website operator can ensure that your most important pages are regularly crawled and indexed. This leads to better visibility in the search engines and ultimately to more organic traffic and a better user experience.

The server response times and performance of a website play a crucial role in the efficient use of the crawl budget. Slow server response times and a poorly optimized website can make the crawl budget inefficient and reduce the number of crawled pages. By optimizing the loading speed, implementing caching strategies and reducing HTTP requests, the performance of the website can be improved. This leads to faster server response times, more efficient use of the crawl budget and ultimately better visibility and user experience.

Good crawlability is the basis for successful indexing and visibility in search engines. By implementing the measures described, it can be ensured that search engine crawlers can find and index all the important pages of a website. This not only improves the ranking in search results, but also the user experience and overall performance of the website.

To conclude with the words of Google itself: A faster website is more user-friendly and at the same time enables a higher crawling frequency. For the Googlebot, a fast website is a sign of well-functioning servers. It can retrieve more content via the same number of connections.

If you have a lot of pages on your website but too many are not crawled or indexed, get in touch with us. We can help you!