How the search engine analyzes the Robots.txt file for your Google site

Automatic robots of search engines, such as Google, before going around the resource, contact the file robots.txtwhich is located in the root of your site. This file contains instructions, which sections are allowed to scan and which are not. It is important to understand that such a protocol does not apply to user services or safety tools that work differently.

What is a Robots.txt file for your Google site

To limit robots access to certain sections of the resource, you can create a file on your site robots.txtIn which the rules are prescribed for each robot. Example of the structure:

User-agent: *
Disallow: /includes/
User-agent: Googlebot
Allow: /includes/
Sitemap: https://example.com/sitemap.xml

This file regulates which parts of the site can be indexed and which - no. To learn from scratch, it is recommended to start with familiarization with the basic principles of work robots.txt and recommendations for its compilation.

Where to post a Robots.txt file on your Google website

The file should be strictly in the root of the site (for example, https://ваш_сайт/robots.txt). It works only for that combination of the domain, protocol and port where it is placed. Podomeni, other ports and protocols require a separate file robots.txt.

Examples of the correct addresses of the Robots.txt file of your site for Google

https://example.com/robots.txt - It works for https://example.com/but not for http:// or other subdomains.
https://www.example.com/robots.txt - covers only www.
ftp://example.com/robots.txt -We apply exclusively for FTP protocol.

Error processing and HTTP response codes on your Google website

Depending on the answer code when trying to get a file robots.txt, the behavior of search robots is changing:

2xx codes - the file is read and used.
3xx codes - if more than 5 redirects, the file is considered inaccessible.
4xx codes (except 429) - it is believed that there are no prohibitions.
5xx codes - with errors of servers, the robot can temporarily stop the site of the site.

How Google is caching the Robots.txt file for your site

Robots can caching the contents of the file up to 24 hours, but in the case of problems (timeouts, errors), the deadline may increase. Headlines Cache-Control Also affect the behavior of caching.

Robots.txt file format and encoding for Google Site

The file should be in the UTF-8 encoding, with the dividers of the CR, CR/LF or LF lines. Incorrect symbols or -content are ignored. The maximum permissible file size is 500 kib, the rest is ignored.

Syntax and supported Robots.txt directives in Google for your site

Each line consists of a key, colon and values. It is allowed to add comments after the sign #. The following fields are supported:

user-agent - Indicates for which robot the rules are applicable.
Allow - The permitted path.
Disallow - Forbidden path.
Sitemap - Full address of the site card.

All paths are sensitive to the register and should begin with /.

Explanation by user-agent: How to specify the rules for specific robots of your site in Google

Meaning user-agent Not sensitive to the register. Use the exact name of the robot to specify specific rules, otherwise the global template is used *.

What does the Disallow directive in the Robots.txt file of your site in Google

Limits the access of robots to the specified paths. However, the URL can still appear in the search results without a page fragment.

ALLOW functions: how to give access to the sections of your site in Google

Allows access to certain paths, even if they partially fall under prohibiting rules.

How to indicate sitemap in Robots.txt for your Google site

It is allowed to post links to the site card. There may be several of them. The address should be absolute and correct. They are applicable to all robots, if not prohibited separately.

Grouping Rules in Robots.txt your site in Google

One set of rules can be applied to several user-agent at once, repeating the user-agent lines one after another in front of the basic rules.

Determining priorities for user- Agent in the Robots.txt file of your site on Google

The most specific user-agent is selected. If several coincidences are found, the longest and most accurate is taken. General rules * Not combined with private ones.

Examples of grouping Robots.txt rules on the site in Google

If several blocks belong to one robot, they are automatically combined. The remaining lines, such as sitemap, are not taken into account when grouping.

How to compare the URL routes and Robots.txt rules in Google for your site

The path from the rule is compared with the page of the page. Symbols work * (any symbol, 0 or more) and $ (end of the line). Examples:

/ - coincides with the root of the site and all the invested URLs.
/fish - All paths starting on /fish.
/fish/ - Only those where Slash is clearly indicated at the end.
/*.php - All files with the expansion .php.
/*.php$ - Only those that end in .php.

The priority of the Allow and Disallow rules in the Robots.txt file on your site for Google

If conflicting rules are present at the same time, it is used that which is longer along the way and less restrictive. That is, in controversial cases, priority is given to the most accurate resolving rule.

If you want to receive help in creating, checking or setting up a Robots.txt file for your site, contact the SEO agency CEO. Write on email: info@seo.computer Or in WhatsApp: +7 920 204 44 61.

ID: 159