Technical SEO - Indexing and Crawlability
This is the first of a series of blogs we are writing to explain the various elements of the SEO work we do.
We are very aware that the technical, behind-the-scenes work we do needs to be transparent for our clients and so have broken it down into bitesize pieces, more easily digested!
And, we don’t just get on and do this – we will give you a full report of before and after so you can see that work has been done and we can start measuring the impact.
It is true that SEO work is rarely instantaneous in results but measured over time you can see how it makes a difference to the traffic and interaction of your website. Conversions are the main aim, once we get people to the site, we then must encourage them to take the plunge and engage with you.
This is a brief overview of what we look for when we cover technical SEO work – in this blog we look at the factors important to getting your page indexed in the first place. The blog is technical but we have tried to de-jargonise it as much as possible!
4xx errors often point to a problem on a website. For example, if you have a broken link on a page, and visitors click it, they may see a 4xx error. It’s important to regularly monitor these errors and investigate their causes, because they may have a negative impact and lower site authority in users’ eyes and they certainly annoy visitors and can give the impression that the site is old, not very good and maybe not worth looking at. None of us want that!
5xx error messages are sent when the server is aware that it has a problem or error. It’s important to regularly monitor these errors and investigate their causes, because they may have a negative impact and lower site authority in search engines’ eyes, plus it is very annoying to arrive at a site to be told you can’t view it because there is a server problem. Sometimes these problems only last seconds, sometimes much longer if it needs the web developer or hosting platform to repair a problem.
Robots.txt file is automatically crawled by robots when they arrive at your website. This file should contain commands for robots, such as which pages should not be indexed. It must be well-formatted to ensure search engines can crawl and read it. If it is not present the search engines may index things you don’t want such as hidden files and passwords etc. we have seen many files that should not be in the public domain thanks to a badly formatted Robot.txt file.
This is not a visible file to visitors to the site and is very different to what used to be listed on sites as ‘sitemaps’.
An XML sitemap should contain all the website pages that you want to get indexed and should be located on the website one directory structure away from the homepage (ex. https://www.site.com/sitemap.xml). In general, it serves to aid indexing. It should be updated when new pages are added to the website and needs to be correctly coded. The sitemap also tells the search engines which pages are linked to each other and gives them an idea of the flow of the website and where to go next to index that is related to the current page they are on. With most websites this is set up to populate the .xml file automatically when you create or remove a page.
Usually, websites are available with and without “www” in the domain name. This issue is quite common, and people link to both www and non-www versions. Fixing this will help you prevent search engines from indexing two versions of a website.
Although such indexation won’t cause a penalty, setting one version as a priority is best practice, especially because it helps you save link juice from links with and without www for one common version.
Using secure encryption is highly recommended for websites however, in many cases, webmasters face technical issues when installing SSL certificates and setting up the HTTP/HTTPS versions of the website.
In case you’re using an invalid SSL certificate (eg. untrusted or expired one), most Web browsers will prevent users from visiting your site by showing them an “insecure connection” notification.
If the HTTP and HTTPS versions of your website are not set properly, both of them can get indexed by search engines and cause duplicate content issues that may undermine your website rankings.
302 redirects are temporary, so they don’t pass any link juice – Google knows that you may reinstate the page so keeps it in its database.. If you use them instead of 301s, search engines might continue to index the old URL, and disregard the new one as a duplicate, or they might divide the link popularity between the two versions, thus hurting search rankings.
That’s why it is not recommended to use 302 redirects if you are permanently moving a page or a website. Instead, stick to a 301 redirect to preserve link juice and avoid duplicate content issues.
In certain cases, either due to bad .htaccess file setup or due to some deliberately taken measures, a page may end up having two or more redirects. It is strongly recommended to avoid redirect chains longer than 2 redirects since they may be the reason for multiple issues:
- There is a high risk that a page will not be indexed as Google bots do not follow more than 3 redirects currently.
- Too many redirects will slow down your page speed. Every new redirect may add up to several seconds to the page load time as the site has to revisit the server each time to ask where it goes next. Imagine a redirect as a dead end, you reach it and if there is yet another redirect in place you must start again because you are not at the end of your journey.
- High bounce rate: users are not willing to stay on a page that takes more than 3 seconds to load – we are all intolerant of a slow loading site now.