Technical SEO – Indexing and Crawlability

Technical SEO - Indexing and Crawlability

This is the first of a series of blogs we are writing to explain the various elements of the SEO work we do.

We are very aware that the technical, behind-the-scenes work we do needs to be transparent for our clients and so have broken it down into bitesize pieces, more easily digested!

And, we don’t just get on and do this – we will give you a full report of before and after so you can see that work has been done and we can start measuring the impact.

It is true that SEO work is rarely instantaneous in results but measured over time you can see how it makes a difference to the traffic and interaction of your website. Conversions are the main aim, once we get people to the site, we then must encourage them to take the plunge and engage with you.

This is a brief overview of what we look for when we cover technical SEO work – in this blog we look at the factors important to getting your page indexed in the first place. The blog is technical but we have tried to de-jargonise it as much as possible!

Technical SEO

1. 404 errors - pages not found and set up incorrectly

4xx errors often point to a problem on a website. For example, if you have a broken link on a page, and visitors click it, they may see a 4xx error. It’s important to regularly monitor these errors and investigate their causes, because they may have a negative impact and lower site authority in users’ eyes and they certainly annoy visitors and can give the impression that the site is old, not very good and maybe not worth looking at. None of us want that!

2. 5XX status codes

5xx error messages are sent when the server is aware that it has a problem or error. It’s important to regularly monitor these errors and investigate their causes, because they may have a negative impact and lower site authority in search engines’ eyes, plus it is very annoying to arrive at a site to be told you can’t view it because there is a server problem. Sometimes these problems only last seconds, sometimes much longer if it needs the web developer or hosting platform to repair a problem.

3. Robot.txt File

Robots.txt file is automatically crawled by robots when they arrive at your website. This file should contain commands for robots, such as which pages should not be indexed. It must be well-formatted to ensure search engines can crawl and read it. If it is not present the search engines may index things you don’t want such as hidden files and passwords etc. we have seen many files that should not be in the public domain thanks to a badly formatted Robot.txt file.

4. .Xml sitemap

This is not a visible file to visitors to the site and is very different to what used to be listed on sites as ‘sitemaps’.

An XML sitemap should contain all the website pages that you want to get indexed and should be located on the website one directory structure away from the homepage (ex. https://www.site.com/sitemap.xml). In general, it serves to aid indexing. It should be updated when new pages are added to the website and needs to be correctly coded. The sitemap also tells the search engines which pages are linked to each other and gives them an idea of the flow of the website and where to go next to index that is related to the current page they are on. With most websites this is set up to populate the .xml file automatically when you create or remove a page.

6. Fixed www and non www versions

Usually, websites are available with and without “www” in the domain name. This issue is quite common, and people link to both www and non-www versions. Fixing this will help you prevent search engines from indexing two versions of a website.

Although such indexation won’t cause a penalty, setting one version as a priority is best practice, especially because it helps you save link juice from links with and without www for one common version.

7. Issues with HTTP/HTTPS site versions

Using secure encryption is highly recommended for websites however, in many cases, webmasters face technical issues when installing SSL certificates and setting up the HTTP/HTTPS versions of the website.

In case you’re using an invalid SSL certificate (eg. untrusted or expired one), most Web browsers will prevent users from visiting your site by showing them an “insecure connection” notification.

If the HTTP and HTTPS versions of your website are not set properly, both of them can get indexed by search engines and cause duplicate content issues that may undermine your website rankings.

8. Queries with redirects

302 redirects are temporary, so they don’t pass any link juice – Google knows that you may reinstate the page so keeps it in its database.. If you use them instead of 301s, search engines might continue to index the old URL, and disregard the new one as a duplicate, or they might divide the link popularity between the two versions, thus hurting search rankings.

That’s why it is not recommended to use 302 redirects if you are permanently moving a page or a website. Instead, stick to a 301 redirect to preserve link juice and avoid duplicate content issues.

9. Pages with long redirects

In certain cases, either due to bad .htaccess file setup or due to some deliberately taken measures, a page may end up having two or more redirects. It is strongly recommended to avoid redirect chains longer than 2 redirects since they may be the reason for multiple issues:

There is a high risk that a page will not be indexed as Google bots do not follow more than 3 redirects currently.
Too many redirects will slow down your page speed. Every new redirect may add up to several seconds to the page load time as the site has to revisit the server each time to ask where it goes next. Imagine a redirect as a dead end, you reach it and if there is yet another redirect in place you must start again because you are not at the end of your journey.
High bounce rate: users are not willing to stay on a page that takes more than 3 seconds to load – we are all intolerant of a slow loading site now.

Next Month...

Next month we will look at ‘On-Page SEO’ and what that means but if you have any burning questions that you think we can help you with, please do get in touch.

Cookie	Duration	Description
CookieConsent	1 year	Stores the user's cookie consent state for the current domain
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
XSRF-TOKEN	14 days	Wix set this cookie for security purposes.

Cookie	Duration	Description
TawkConnectionTime	session	Tawk.to, a live chat functionality, sets this cookie. For improved service, this cookie helps remember users so that previous chats can be linked together.
twk_idm_key	session	Tawk set this cookie to allow the website to recognise the visitor in order to optimize the chat-box functionality.

Cookie	Duration	Description
_p_hfp_client_id	less than a minute	Elfsight sets this cookie to implement social platforms on the website and enables the social platforms to track the users by assigning them a specific ID.
SRM_B	1 year 24 days	Used by Microsoft Advertising as a unique ID for visitors.

Cookie	Duration	Description
_clck	1 year	Microsoft Clarity sets this cookie to retain the browser's Clarity User ID and settings exclusive to that website. This guarantees that actions taken during subsequent visits to the same website will be linked to the same user ID.
_clsk	1 day	Microsoft Clarity sets this cookie to store and consolidate a user's pageviews into a single session recording.
_ga	2 years	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat	Session	Used by Google Analytics to throttle request rate
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gid	Session	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
CLID	1 year	Microsoft Clarity set this cookie to store information about how visitors interact with the website. The cookie helps to provide an analysis report. The data collection includes the number of visitors, where they visit the website, and the pages visited.
collect	Session	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
MR	7 days	This cookie, set by Bing, is used to collect user information for analytics purposes.
r/collect	Session	This cookie is used to send data to Google Analytics about the visitor's device and behavior. It tracks the visitor across devices and marketing channels.
SM	session	Microsoft Clarity cookie set this cookie for synchronizing the MUID across Microsoft domains.

Cookie	Duration	Description
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and verify ads' clicks on the Bing search engine. The cookie helps in reporting and personalization as well.
MUID	1 year 24 days	Bing sets this cookie to recognise unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Technical SEO – Indexing and Crawlability

Technical SEO - Indexing and Crawlability

Technical SEO

Next Month...

Request a quote