Python Script SEO Content Analysis of your Competitor

Analyzing the content material of your rivals will give you invaluable insights regarding your operations and targets. This primary Python script can offer you information on n-Grams in seconds.

This Python Script could also be an elementary model of a content material evaluation of your competitor. The most plan is to induce a quick abstract of what the writing focus look like. A lean strategy is to fetch all pc addresses throughout the sitemap, take aside the URL slugs and run an n-Gram evaluation on it. If you wish to perceive quite a bit about n-Gram evaluation, please actually have a look at our Free N-Gram Tool. you may apply it not only for pc deal with nonetheless conjointly key phrases, titles, and so forth

As a end result, you may get an inventory of used n-Grams throughout the URL slugs together with the quantity of pages that used this n-Gram. This evaluation will solely take a pair of seconds, even on huge sitemaps, and may run with decrease than fifty traces of code.

Additional approaches

If you wish to induce deeper insights, I’ll recommend touring on with these approaches:

  • Fetch the content material of each common useful resource locator throughout the sitemap
  • Create n-Grams present in headlines
  • Create n-Grams discovered within the content material
  • Extract key phrases with Textrank or Rake
  • Extract acquainted entities for your SEO enterprise

But let’s start simple and take a main take into account the hole with this script. Supported your suggestions, I might add quite a bit of refined approaches. Before you run the script, you merely should be compelled to enter the sitemap URL you need to analyze. Once working the script, you may discover your results in sitemap_ngrams.csv. Open it in Excel or Google Sheets and make merry with analyzing the information.

Here is the Python code:

# Pemavor.com Sitemap Content Analyzer

# Author: Stefan Neefischer

import advertools as adv

import pandas as pd

def sitemap_ngram_analyzer(website):

sitemap = adv.sitemap_to_df(website)

sitemap = sitemap.dropna(subset=[“loc”]).reset_index(drop=True)

# Some sitemaps retains urls with “https://www.bignewsnetwork.com/” on the tip, some is with no “https://www.bignewsnetwork.com/”

# If there may be “https://www.bignewsnetwork.com/” on the tip, we take the second final column as slugs

# Else, the final column is the slug column

slugs = sitemap[‘loc’].dropna()[sitemap[‘loc’].dropna().str.endswith(“https://www.bignewsnetwork.com/”)].str.break up(“https://www.bignewsnetwork.com/”).str[-2].str.exchange(‘-‘, ‘ ‘)

slugs2 = sitemap[‘loc’].dropna()[~sitemap[‘loc’].dropna().str.endswith(“https://www.bignewsnetwork.com/”)].str.break up(“https://www.bignewsnetwork.com/”).str[-1].str.exchange(‘-‘, ‘ ‘)

# Merge two collection

slugs = record(slugs) + record(slugs2)

# adv.word_frequency mechanically removes the cease phrases

word_counts_onegram = adv.word_frequency(slugs)

word_counts_twogram = adv.word_frequency(slugs, phrase_len=2)

output_csv = pd.concat([word_counts_onegram, word_counts_twogram], ignore_index=True)

.rename({‘abs_freq’:’Count’,’phrase’:’Ngram’}, axis=1)

.sort_values(‘Count’, ascending=False)

#Save enter csv with scores

output_csv.to_csv(‘sitemap_ngrams.csv’, index=False)

print(“csv file saved”)

# Provide the Sitemap that needs to be analyzed

website = “https://searchengineland.com/sitemap_index.xml”

sitemap_ngram_analyzer(website)

#the outcomes can be saved to sitemap_ngrams.csv file

Recommended For You

Leave a Reply