Automatic robots of search engines, such as Google, before going around the resource, contact the file robots.txtwhich is located in the root of your site. This file contains instructions, which sections are allowed to scan and which are not. It is important to understand that such a protocol does not apply to user services or safety tools that work differently.
To limit robots access to certain sections of the resource, you can create a file on your site robots.txtIn which the rules are prescribed for each robot. Example of the structure:
User-agent: * Disallow: /includes/ User-agent: Googlebot Allow: /includes/ Sitemap: https://example.com/sitemap.xml
This file regulates which parts of the site can be indexed and which - no. To learn from scratch, it is recommended to start with familiarization with the basic principles of work robots.txt and recommendations for its compilation.
The file should be strictly in the root of the site (for example, https://ваш_сайт/robots.txt). It works only for that combination of the domain, protocol and port where it is placed. Podomeni, other ports and protocols require a separate file robots.txt.
https://example.com/robots.txt - It works for https://example.com/but not for http:// or other subdomains.https://www.example.com/robots.txt - covers only www.ftp://example.com/robots.txt -We apply exclusively for FTP protocol.Depending on the answer code when trying to get a file robots.txt, the behavior of search robots is changing:
Robots can caching the contents of the file up to 24 hours, but in the case of problems (timeouts, errors), the deadline may increase. Headlines Cache-Control Also affect the behavior of caching.
The file should be in the UTF-8 encoding, with the dividers of the CR, CR/LF or LF lines. Incorrect symbols or -content are ignored. The maximum permissible file size is 500 kib, the rest is ignored.
Each line consists of a key, colon and values. It is allowed to add comments after the sign #. The following fields are supported:
All paths are sensitive to the register and should begin with /.
Meaning user-agent Not sensitive to the register. Use the exact name of the robot to specify specific rules, otherwise the global template is used *.
Limits the access of robots to the specified paths. However, the URL can still appear in the search results without a page fragment.
Allows access to certain paths, even if they partially fall under prohibiting rules.
It is allowed to post links to the site card. There may be several of them. The address should be absolute and correct. They are applicable to all robots, if not prohibited separately.
One set of rules can be applied to several user-agent at once, repeating the user-agent lines one after another in front of the basic rules.
The most specific user-agent is selected. If several coincidences are found, the longest and most accurate is taken. General rules * Not combined with private ones.
If several blocks belong to one robot, they are automatically combined. The remaining lines, such as sitemap, are not taken into account when grouping.
The path from the rule is compared with the page of the page. Symbols work * (any symbol, 0 or more) and $ (end of the line). Examples:
/ - coincides with the root of the site and all the invested URLs./fish - All paths starting on /fish./fish/ - Only those where Slash is clearly indicated at the end./*.php - All files with the expansion .php./*.php$ - Only those that end in .php.If conflicting rules are present at the same time, it is used that which is longer along the way and less restrictive. That is, in controversial cases, priority is given to the most accurate resolving rule.
If you want to receive help in creating, checking or setting up a Robots.txt file for your site, contact the SEO agency CEO. Write on email: info@seo.computer Or in WhatsApp: +7 920 204 44 61.
ID: 159