All You Need To Know About the X- Robots-Tags

image3 1

Table of Contents

Introduction

Search engine optimization (SEO) is an important part of web page optimization that ultimately results in higher rankings on search engine results pages.

At its most fundamental level, SEO relies heavily on search engine spiders crawling and indexing your website.

That being said, there may be pages that webmasters do not want to show in their search engine results.

By opting to exclude certain pages or content from being indexed, webmasters can ensure that only relevant and wanted information is presented in Google’s search results.

However, guiding search engines to crawl and index your website the way you want can be a challenging task.

When it comes to instructing search engine bots on how to do their work, robots.txt and meta robots tags are the way to go.

Robots.txt instructs crawlers about the entire site, whereas a meta robots tag includes directions about specific pages.

With meta robots tags, you can indicate to search engines whether or not they should index a page or follow its links using index, noindex, follow, nofollow.

Another useful tool at your disposal is the X-Robots-Tag, which allows you to tell bots not to index or not to follow web pages.

However these tags are often used incorrectly, resulting in conflicting directives that don’t achieve the result that’s wanted.

Through this guide, we aim to provide you with the knowledge necessary to use this flexible tag effectively. Let’s dive in.

Say Hello to X-Robots-Tag

Google added support for the X-Robots-Tag directive in 2007.

Utilizing X Robots Tags to control how search engines crawl and index web pages is an alternative to using meta robot tags.

The robots meta tag is fine to implement noindex directives on HTML pages. However, if one wishes to prevent search engines from indexing files such as images or PDFs, they should employ the use of x-robots-tags.

This is an HTTP header response rather than an HTML tag and is a bit more complicated.

According to Google,

Any directive that can be used in a robots meta tag can also be specified as an X-Robots-Tag.

When X-Robots Tags Are The Right Choice?

The meta robots tag and X-Robots Tag can both be used to set robots.txt-related directives within the headers of an HTTP response.

image2 1

There are certain situations, however, where the X-Robots-Tag should be utilized:

  1. When you want to provide specific instructions related to non-HTML files such as images and PDFs.
  2. When you want to deindex a large number of pages with certain parameters or even an entire subdomain.

X-Robots-Tags are the only option in the first case, while they provide an efficient alternative to the laborious implementation of individual robots meta tags in the second case, saving time and effort.

When wanting to block a page written in HTML from being accessed, the meta tag approach is a reliable, easy-to-implement method.

However, when an image needs to be restricted from being crawled by Googlebot, for example, you could use the HTTP response approach to do this in code.

You can always use the latter method if you don’t feel like adding additional HTML to your website.

The X-Robots-Tag header is a great tool for webmasters, as it can enable them to specify specific instructions in multiple parts of an HTTP response.

Combining several tags within an HTTP response, or using comma-separated directives to specify directives allows for great customization and fine control over how search engine bots interact with your website.

For example, if you don’t want a certain page to be cached, you can use a combination of “noarchive” and “unavailable_after” tags to ensure that no one will be able to access that page after its expiration date.

By giving you ultimate accuracy over the instructions sent out to bots scanning the web, the X-Robots-Tag allows much more customization than the meta robots tag.

Understanding X-Robots-Tag directives

There are two types of directives: crawler directives and indexer directives. We’ll explain the distinction in more detail below.

Crawler Directives

The robots.txt file only includes ‘crawler directives’, which inform search engines where they can or cannot go. A robots.txt file specifies where on-site search engine bots should crawl and where they should not crawl using the user agent, allow, disallow and sitemap directives.

Indexer Directives

The Meta Robots tag can be utilized to effectively prevent search engines from displaying pages that should remain out of the search results.

Where Do You Put The X-Robots-Tag?

In order to block specific file types, it is recommended to add the X-Robots-Tag to an Apache configuration or a .htaccess file.

The X-Robots-Tag can be added to a site’s HTTP responses via the.htaccess file in an Apache server configuration.

As described earlier, the X-Robots-Tag gives you greater control over how specific file(types) are indexed.

Below are some examples of the X Robots Tag in action.

Examples Of The X-Robots-Tag In Use

The theory is nice, but let’s see how you could use the X-Robots-Tag in the real world! Imagine you run a website with some .pdf files, but for a specific purpose, you do not wish  for search engines to index that file type.

On Apache servers, this configuration would look something like the below:

<Files ~ “\.pdf$”>
Header set X-Robots-Tag “noindex, nofollow”
</Files>

If you’re using Nginx instead of Apache, you can achieve a similar result by adding the following in your server configuration:

location ~* \.pdf$ {
add_header X-Robots-Tag “noindex, nofollow”;
}

Now, let us consider another scenario for the X-Robots-Tag.

We want to use it to prevent image files such as .jpg, .gif, .png, etc. from being indexed. To accomplish this, the X-Robots-Tag should be applied as follows:

<Files ~ “\.(png| jpe?g|gif)$”>
Header set X-Robots-Tag “noindex”
</Files>

X-Robots-Tag HTTP header is an invaluable tool to prevent certain pages from indexing and serving, but it is crucial to understand the interworking of such directives and how they relate to one another.

For instance, what happens if both the X-Robots-Tag and a meta robots tag are located when crawler bots discover a URL?

In this case, if that URL is blocked from robots.txt, then certain indexing and serving directives behind those tags cannot be identified or followed.

Furthermore, in order for directives to be followed, the specific URLs containing them must be allowed access with regards to crawling them.

Check For An X-Robots-Tag

Checking for an X-Robots-Tag on a site can be accomplished through various methods.

A browser extension is a quick and convenient way to tell X-Robots-Tag information about the URL.

The Web Developer plugin can be utilized to determine if an X-Robots-Tag is being implemented.

The View Response Headers feature allows you to overview each of the HTTP headers being employed.

Screaming Frog is another useful method for scaling and identifying issues on websites with a million pages.

After running the site through Screaming Frog, the “X-Robots-Tag” column can be navigated to in order to view which sections of the site are implementing the tag, as well as the specific directives.

Making use of X-Robots-Tags On Your Site

Understanding and controlling how search engines interact with your website is an essential part of search engine optimization that can be accomplished effectively by using the X-Robots-Tag.

image1 1

Despite its powerfulness, there are certain precautions one must take while utilizing this tool to ensure it’s used correctly.

It requires a keen eye and wise decision making to make sure you don’t accidentally deindex your entire site.

If you do your due diligence though, it can be a useful addition to any SEO’s toolkit.

FAQs

To block certain file types, you should add the X-Robots-Tag to a .htaccess file or an Apache configuration. In an Apache server configuration, the X-Robots-Tag can be added to the site’s HTTP responses via the .htaccess file.

These X-Robots-Tag offer you greater control over how certain files (types) are indexed by the search engine bots.

Meta Robots tags allow you to have control over the indexing behavior at the page level. However, the X-Robots tag can be added as a part of the HTTP header. This allows you to control the indexing of a page and its specific elements.

Meta Robots tags enable you to bar specific search engines from displaying certain pages on a website in SERPs, while X-Robots-tags let you control how specified file types get indexed.