The robots protocol on the importance of website search engine

to talk about the working principle of the search engine spiders. Spiders, also known as web crawler, web spider, robot, called "chasing more vivid. A according to certain rules, automatically crawl the web information program or script. In addition some do not use names and ants, automatic indexing, or worm simulation program.

search engine spiders why not visit their website, or at low frequency. Effect of web pages in this is relatively large, directly affect the site’s ranking.

said the * wildcard symbol. Here refers to the distribution of all search engines.

do you know how many spiders web? The world famous Google spider spider, NO1:: noble baby Google spider new name "compatible; bot/2.1" noble noble baby; "baby bot-Mobile", the name is WAP, grab the content: the world’s largest search engine company. NO2: Shanghai: Baiduspider common spider love love Shanghai its the same type of spider and these Baiduspider-mobile (WAP engine), Baiduspider-image (search engine), Baiduspider-video (video engine) Baiduspider-news (news engine), search engine Chinese strongest companies. NO3:360 spider: 360Spider. NO4:SOSO NO4:Sosospider, YAHOO spider, spider, NO:5 "Yahoo! Slurp China" or Yahoo! NO6:: YoudaoBot, YodaoBot, Youdao spider spider: Sogou News NO7: Sogou Spider, which is the world’s most famous spider, in fact there are a lot of I will not enumerate.

at the beginning of the letters must be capitalized, followed by the colon English form, and then followed by a space.

The crawler

feature is to capture prey by spider webs, so our website with the same spider to catch prey, if the content of the website content update is novel and unique, high frequency of spider will often visit your website. Don’t let the spider crawling casually, such as backstage address, so there is a robots of this agreement, this is very important if the robots.txt protocol to write good can be effective. The robots.txt protocol can cater to the tastes of the spider how to write:

User-agent: *

A lot of the most headache problem is that the

said Disallow: / banned search engines crawl all pages / directory.


Post Your Comment Here

Your email address will not be published. Required fields are marked *