How to Use Machine Learning for SEO Competitor Research

With the ever-increasing urge for food of SEO professionals to learn Python, there’s by no means been a greater or extra thrilling time to benefit from machine studying’s (ML) capabilities and apply these to SEO.

This is very true in your competitor analysis.

In this column, you’ll find out how machine studying helps handle widespread challenges in SEO competitor analysis, how to arrange and prepare your ML mannequin, how to automate your evaluation, and extra.

Let’s do that!

Why We Need Machine Learning in SEO Competitor Research

Most if not all SEO execs working in aggressive markets will analyze the SERPs and their enterprise opponents to discover out what it’s their website is doing to obtain the next rank.

Back in 2003, we used spreadsheets to accumulate knowledge from SERPs, with columns representing totally different elements of the competitors such because the variety of hyperlinks to the house web page, variety of pages, and many others.

In hindsight, the concept was proper however the execution was hopeless due to the constraints of Excel in performing a statistically sturdy evaluation within the quick time required.


Continue Reading Below

And if the bounds of spreadsheets weren’t sufficient, the panorama has moved on fairly a bit since then as we now have:

  • Mobile SERPs.
  • Social media.
  • A way more refined Google Search expertise.
  • Page Speed.
  • Personalized search.
  • Schema.
  • Javascript frameworks and different new net applied sciences.

The above is certainly not an exhaustive listing of traits however serves to illustrate the ever-increasing vary of things that may clarify the benefit of your higher-ranked opponents in Google.

Machine Learning within the SEO Context

Thankfully, with instruments like Python/R, we’re now not topic to the bounds of spreadsheets. Python/R can deal with hundreds of thousands to billions of rows of knowledge.

If something, the restrict is the standard of knowledge you’ll be able to feed into your ML mannequin and the clever questions you ask of your knowledge.

As an SEO skilled, you can also make the decisive distinction to your SEO marketing campaign by reducing by way of the noise and using machine learning on competitor knowledge to uncover:


Continue Reading Below

  • Which rating elements can finest clarify the variations in rankings between websites.
  • What the profitable benchmark is.
  • How a lot a unit change within the issue is price when it comes to rank.

Like any (knowledge) science endeavor, there are a selection of questions to be answered earlier than we will begin coding.

What Type of ML Problem is Competitor Analysis?

ML solves various issues whether or not it’s categorizing issues (classification) or predicting a steady quantity (regression).

In our specific case, because the high quality of a competitor’s SEO is denoted by its rank in Google, and that rank is a steady quantity, then the ML downside is one in all regression.

Outcome Metric

Given that we all know the ML downside is one in all regression, the end result metric is rank. This is sensible for various causes:

  • Rank received’t endure from seasonality; an ice cream model’s rankings for searches on [ice cream] received’t depreciate as a result of it’s winter, in contrast to the “customers” metric.
  • Competitor rank is third-party knowledge and is offered utilizing business SEO instruments, in contrast to their person visitors and conversions.

What Are the Features?

Knowing the end result metric, we should now decide the unbiased variables or mannequin inputs also called options. The knowledge sorts for the characteristic will range, for instance:

  • First paint measured in seconds can be a numeric.
  • Sentiment with the classes optimistic, impartial, and adverse can be an element.

Naturally, you need to cowl as many significant options as potential together with technical, content material/UX, and offsite for probably the most complete competitor analysis.

What Is the Math?

Given that rankings are numeric, and that we wish to clarify the distinction in rank, then in mathematical phrases:

rank ~ w_1*feature_1 + w_2*feature_2 + … + w_n*feature_n

~ (often known as the “tilde”) means “defined by”

n being the nth characteristic

w is the weighting of the characteristic

Using Machine Learning to Uncover Competitor Secrets

With the solutions to these questions in hand, we’re prepared to see what secrets and techniques machine studying can reveal about your competitors.

At this level, we’ll assume that your knowledge (recognized on this instance as “serps_data”) has been joined, reworked, cleaned, and is now prepared for modeling.


Continue Reading Below

As a minimal, this knowledge will include the Google rank and have knowledge you need to check.

For instance, your columns may embrace:

  • Google_rank.
  • Page_speed.
  • Sentiment.
  • Flesch_kincaid_reading_ease.
  • Amp_version_available.
  • Site_depth.
  • Internal_page_rank.
  • Referring_comains depend.
  • avg_domain_authority_backlinks.
  • title_keyword_string_distance.

Training Your ML Model

To prepare your mannequin, we’re utilizing XGBoost as a result of it tends to ship higher outcomes than different ML fashions.

Alternatives chances are you’ll want to trial in parallel are LightGBM (particularly for a lot bigger datasets), RandomForest, and Adaboost.

Try utilizing the next Python code for XGBoost for your SERPs dataset:

# import the libraries

import xgboost as xgb

import pandas as pd

serps_data = pd.read_csv('serps_data.csv')

# set the mannequin variables

# your SERPs knowledge with every part however the google_rank column

serp_features = serps_data.drop(columns = ['Google_rank'])

# your SERPs knowledge with simply the google_rank column

rank_actual = serps_data.Google_rank

# Instantiate the mannequin

serps_model = xgb.XGBRegressor(goal="reg:linear", random_state=1231)

# match the mannequin

serps_model.match(serp_features, rank_actual)

# generate the mannequin predictions

rank_pred = serps_model.predict(serp_features)

# consider the mannequin accuracy

mse = mean_squared_error(rank_actual, rank_pred)

Note that the above could be very primary. In an actual shopper situation, you’d need to trial various mannequin algorithms on a coaching knowledge pattern (about 80% of the information), consider (utilizing the remaining 20% knowledge), and choose one of the best mannequin.


Continue Reading Below

So what secrets and techniques can this machine studying mannequin inform us?

The Most Predictive Drivers of Rank

The chart exhibits probably the most influential SERP options or rating elements in descending order of significance.

In this specific case, crucial issue was “title_keyword_dist” which measures the string distance between the title tag and the goal key phrase. Think of this because the title tag’s relevance to the key phrase.


Continue Reading Below

No shock there for the SEO practitioner, nevertheless, the worth right here is offering empirical proof to the non-expert enterprise viewers that doesn’t perceive the necessity to optimize title tags.

Other elements of word on this trade are:

  • no_cookies: The variety of cookies.
  • dom_ready_time_ms: A measure of web page pace.
  • no_template_words: Counts the variety of phrases outdoors the principle physique content material part.
  • link_root_domains_links: Count of hyperlinks to root domains.
  • no_scaled_images: Count of photographs scaled that want scaling by the browser to render.

Every market or trade is totally different, so the above will not be a basic consequence for the entire of SEO!

How Much Rank a Ranking Factor Is Worth

In one other market case, we will additionally see how a lot rank can be delivered.

Forecast rank change.

In the chart above, we’ve an inventory of things and the rank change for each optimistic unit change in that issue.


Continue Reading Below

For instance, for each unit improve in meta description size by 1 character, there’s a corresponding lower in Google rank of 0.1.

Taken out of context, this sounds ridiculous. However, given that the majority meta descriptions are populated it might imply {that a} unit change away from the typical meta description size would then lead to a lower in Google Search rating.

The Winning Benchmark for a Ranking Factor

Below is a graph plotting the typical title tag size for a special trade to the one above, which additionally features a line of finest match:

Graph plotting the average title tag length.

Despite one of the best apply SEO advice of utilizing up to 70 characters for title tag size, the information plotted above exhibits the precise optimum size on this trade to be 60 characters.


Continue Reading Below

Thanks to machine studying, we’re not solely ready to floor crucial elements however when taking a deep dive can even see the profitable benchmark.

Automating Your SEO Competitor Analysis with Machine Learning

The above utility of machine studying is nice for getting some concepts to break up AB check and enhance the SEO program with evidence-driven change requests.

It’s additionally necessary to acknowledge that this evaluation is made all of the extra highly effective when it’s ongoing.


Because the ML evaluation is only a snapshot of the SERPs for a single time limit.

Having a steady stream of knowledge assortment and evaluation means you get a more true image of what’s actually occurring with the SERPs for your trade.

This is the place SEO purpose-built knowledge warehouse and dashboard techniques turn out to be useful, and these merchandise can be found immediately.

What these techniques do is:

  • Ingest your knowledge out of your favourite SEO instruments each day.
  • Combine the information.
  • Use ML to floor insights like to above in a entrance finish of your alternative like Google Data Studio.


Continue Reading Below

To construct your personal automated system, you’d deploy right into a cloud infrastructure like Amazon Web Services (AWS) or Google Cloud Platform (GCP) what known as ETL i.e., extract, rework and cargo.

To clarify:

  • Extract – Daily calling of your SEO software APIs.
  • Transform – The cleansing and evaluation of your knowledge utilizing ML as described above.
  • Load – Depositing the completed end in your knowledge warehouse.

Thus your knowledge assortment, evaluation, and visualization are automated in a single place.


Competitor analysis and evaluation in SEO is tough as a result of there are such a lot of rating elements to management for.

Spreadsheet instruments will not be up to it, due to the quantities of knowledge concerned (not to mention the statistical capabilities that knowledge science languages like Python supply).

When conducting SEO competitor analysis utilizing machine studying, it’s necessary to perceive that this can be a regression downside, the goal variable is Google rank, and that the hypotheses are the rating elements.

Using ML in your opponents can inform you what the important thing drivers are, establish profitable benchmarks amongst them, and inform simply how a lot raise in rank your optimizations can doubtlessly ship.


Continue Reading Below

The evaluation is a snapshot solely, so to keep on prime of the opponents, automate this course of utilizing Extract, Transform, Load (ETL).

More Resources:

Image Credits

All screenshots taken by writer, June 2021

Recommended For You

Leave a Reply