Automatic robots of search engines follow the Robots Exclusion Protocol (REP) rules, which means: before scanning the site, the search engine reads the file robots.txtTo determine which sections of the site are permitted or prohibited for indexation. This protocol does not apply to tools controlled by users or employees of security goals (for example, scanning for malicious software).
This material explains in detail how the directives of REP are interpreted. The original specification can be found in RFC 9309.
If you do not want some parts of your site to be indexed by search engines, create a Robots.txt file with the necessary rules. This is a simple text document, which indicates which search boots the access is permitted and which is prohibited. An example of a file structure:
User-agent: * Disallow: /includes/ User-agent: Googlebot Allow: /includes/ Sitemap: https://вашдомен.ру/sitemap.xml
If you first encounter Robots.txt, start by studying the basics and practical tips for its creation.
The Robots.txt file should be in the root catalog of the site and be available on the supported protocol. The search engine takes into account the protocol, port and domain name. For example, the file will be used only for the same host as its location, including the protocol and the port.
Here are examples of the correct ways to the file and their action:
www.The behavior of the search robot depends on the HTTP code received when the file is requested:
The contents are caching up to 24 hours, sometimes longer - with errors of loading. Title Cache-Control may affect the period of storage of a copy.
The file should be in the UTF-8 encoding, simple text. Translations of lines are permissible in any format (CR, LF, CRLF). Erroneous lines are ignored, like, BOM, unsupported symbols.
The maximum permissible file size is 500 KIB. Everything that exceeds this volume is ignored.
Each line includes a field, colon and value. The following fields are supported:
user-agent - determines which bot belongs to the rule;disallow - prohibits access to a certain path;allow - allows access to the path (even if there are prohibiting rules);sitemap -Indicates the location of the XML site of the site.This is the name of the search bot, which include these rules. The value is not sensitive to the register.
Prohibits access to certain paths. If the path is not indicated, the rule is ignored. Value sensitive to the register.
Allows access to the URL. It works in conjunction with other rules, with a conflict, the least restrictive is selected.
The site of the site’s URL is completely indicated. The field can be repeated. It may be on another domain. Not attached to a specific bot.
You can indicate several groups with different or the same user-agent. For example:
user-agent: a disallow: /private user-agent: b disallow: /temp user-agent: c user-agent: d disallow: /files
Each bot uses only one group of rules-the most suitable named user-agent. General rules p * They are used if there are no more specific ones.
user-agent: bot-news disallow: /news-private user-agent: * disallow: / user-agent: bot disallow: /all
Bot bot-news Uses the first group, bot - The third, all the others are the second.
A comparison of the path with the URL takes into account the register, as well as special symbols. Supported:
* - corresponds to any number of characters;$ - denotes the end of the URL./ - corresponds to all pages;/$ - only root;/fish - Everything that begins with /fish;/*.php$ - URL, ending on .php.In the conflict of rules with different path lengths, a longer one is used. With equal length - less restrictive.
Examples:
For all issues of Robots.txt settings of your site, as well as other aspects of SEO, you can contact the team SEO companies "seo.computer" By email: info@seo.computer or through WhatsApp: +79202044461
ID: 159