Setting up the crawler
Getting started with the Searchine crawler.
Searchine uses a cloud-based crawler to create an index of your website for fast and accurate searches.
Find the right content
The Searchine website crawler indexes your website by crawling every page it finds based on:
- Inspecting robots.txt
- Sitemap(s) (manually added or by inspecting robots.txt)
- Links found on crawled pages.
If your robots.txt does not contain a reference to your sitemap or sitemap index then you need to specify those on the Searchine.net Portal.
Searchine crawls your website just like any other search engine will do. This means that we pay attention to common SEO rules.
- The crawler will not crawl or index pages that are not allowed by robots.txt or do have a do not index tag in place.
- We will take notice of canonical urls.
Robots.txt file
The best option is to add the sitemap to the robots.txt file and add the robots.txt to the root of your website.
Example file:
User-agent: *
Disallow: /admin/
Disallow: /etc/
Sitemap: https://www.yourdomain.com/sitemap.xml
Sitemap
The sitemap should contain (almost) every page or document in your website you want search engines to be aware of.
Also, you can include extra information like priority and change frequency and Searchine will take that into account to determine which pages should be crawled with extra priority.
About query string parameters
Searchine ignores all query string parameters by default.
If your website uses certain query string parameters (such as "?page=detail") and you want Searchine to index those pages you need to specify those query string parameter names.
In the portal, go to websites and open up the website details by clicking on the row. On this page you can add the parameters.
Canonical URLs
We suggest you add the canonical meta tag to your pages so that duplicate content will not be visible in the search results.