Robots.txt is a file that tells search engines which areas of your website to index.
What is its role exactly? How to create the robots.txt file? And how to use it for your SEO?
What is the robots.txt file?
A robots.txt file is a text file and is located at the root of your website. Its only role is to prevent search engine bots from indexing certain areas of your website. The robots.txt file is one of the first files analyzed by robots (spiders).
ROBOTS TXT |
What is it used for?
The robots.txt file sends instructions to the search engine bots that analyze your website, it's a robot exclusion protocol. With this file, you can block crawling of:
- your website to certain robots (also called 'Spiders' or 'agents').
- Some pages of your website to robots or/and some pages to some robots.
To fully understand the value of a robots.txt file, we can take for example a website made up of a public space to communicate with an intranet dedicated to employees and customers. In this case, the public area can be accessed by bots, and the private area is prohibited.
This file also tells search engines the address of the website's sitemap file.
Where can I find the ROBOTS.TXT file?
You can find a robots.txt file at the root level of your website. To check if it's on your site, simply type the address bar in your browser like this example: "https://www.yourwebsiteaddress.com/robots.txt".
If the file is:
- Absent, it will display a 404 error The bots consider that there is no prohibited content.
- Present, it will be displayed and the bots will follow the instructions in your file.
How to create Robots txt?
To create your own robots.txt file, you must be able to access your domain root first.
You can create a robot's TXT file manually or you can create it by default by most CMS like WordPress at the time of installation. But it is also possible to create your own file using online tools there are plenty of them that can definitely help you.
For manual creation, you can use a simple text editor such as Notepad or Visual Studio Code while respecting both:
- A file name: robots.txt.
- Instructions and syntax.
- Structure: one instruction per line without empty lines.
The instructions and syntax of the robots.txt file
Robots.txt files use the following commands or statements:
- Allow: allow is a command that allows access to a specific URL placed in a restricted folder.
- Disallow: disallow is a command that prevents user agents from accessing a specific URL or folder.
- User-agent: user-agents are search engine robots, for example, "Bingbot for Bing" or "Googlebot for Google".
Example for a robots.txt file:
# file for site robots https://www.yourwebsiteaddress.com/
User-Agent: * (allows access to all robots)
Disallow: /intranet/ (forbids exploration of the intranet file)
Disallow: /login.php (forbidden to explore url https://www.yourwebsiteaddress.com/login.php)
Allow: /*.css?* (Allow access to all css resources)
Sitemap: https://www.yourwebsiteaddress.com/sitemap_index.xml (link to the sitemap for SEO)
In the example above, by inserting an asterisk (*) the User-agent command applies to all crawlers. The hash mark (#) is used so that comments are not taken into account by bots.
You can find resources for specific search engines and content management systems at robots-txt.com.
Robots.txt and SEO
In terms of SEO optimization for your website, robots.txt allows you to:
- Submit your sitemap to the bots to provide accurate indications of which URLs will be indexed.
- Save your bots 'crawling budget' by simply excluding low-quality pages from your website.
- Prevent bots from indexing duplicate content.
Comment tester votre fichier robots.txt?
All you need to do is create and authenticate your site with Google Search Console to test your robots.txt file. Once your account is created, you will need to click on Explore in the menu and then on the Test Tool Robots.txt file.
The robots.txt file test checks if all-important URLs can be indexed by Google and notifies you if there is an error.
Finally, if you want to master indexing your website, it is necessary to create a robots.txt file. If there is no file, all the URLs found by the bots will be indexed and you will find them in the search engine results.
Thank you
ReplyDeleteVery helpful thanks a lot
ReplyDelete