How to Use Google Sheets for Web Scraping & Campaign Building

We’ve all been in a scenario the place we had to extract information from an internet site sooner or later.

When engaged on a brand new account or marketing campaign, you won’t have the info or the knowledge accessible for the creation of the adverts, for instance.

In a great world, we’d have been supplied with all the content material, touchdown pages, and related data we’d like, in an easy-to-import format similar to a CSV, Excel spreadsheet, or Google Sheet. (Or on the very least, supplied what we’d like as tabbed information that may be imported into one of many aforementioned codecs.)

But that’s not at all times the way in which it goes.

Those missing the instruments for net scraping — or the coding data to use one thing like Python to assist with the duty — could have had to resort to the tedious job of manually copying and pasting presumably lots of or hundreds of entries.

In a latest job, my staff was requested to:

Continue Reading Below

Go to the consumer’s web site.
Download greater than 150 new merchandise unfold throughout 15 completely different pages.
Copy and paste the product identify and touchdown web page URL for every product right into a spreadsheet.

Now, you’ll be able to think about how prolonged the duty would have been if we’d completed simply that and manually executed the duty.

Not solely is it time-consuming, however with somebody manually going by means of that many gadgets and pages and bodily having to copy and paste the info product by product, the probabilities of making a mistake or two are fairly excessive.

It would then require much more time to assessment the doc and ensure it was error-free.

There has to be a greater means.

Good information: There is! Let me present you ways we did it.

What is IMPORTXML?

Enter Google Sheets. I’d such as you to meet the IMPORTXML perform.

According to Google’s support page, IMPORTXML “imports information from any of varied structured information varieties together with XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.”

Continue Reading Below

Essentially, IMPORTXML is a perform permits you to scrape structured information from webpages — no coding data required.

For instance, it’s fast and simple to extract information similar to web page titles, descriptions, or hyperlinks, but additionally extra advanced data.

How Can IMPORTXML Help Scrape Elements of a Webpage?

The perform itself is fairly easy and solely requires two values:

The URL of the webpage we intend to extract or scrape the knowledge from.
And the XPath of the aspect through which the info is contained.

XPath stands for XML Path Language and can be utilized to navigate by means of parts and attributes in an XML doc.

For instance, to extract the web page title from https://en.wikipedia.org/wiki/Moon_landing, we’d use:

=IMPORTXML(“https://en.wikipedia.org/wiki/Moon_landing”, “//title”)

This will return the worth: Moon touchdown – Wikipedia.

Or, if we’re wanting for the web page description, do this:

=IMPORTXML(“https://www.searchenginejournal.com/”,”//meta[@name=’description’]/@content material”)

Here is a shortlist of a few of the most typical and helpful XPath queries:

Page title: //title
Page meta description: //meta[@name=’description’]/@content material
Page H1: //h1
Page hyperlinks: //@href

See IMPORTXML in Action

Since discovering IMPORTXML in Google Sheets, it has actually grow to be one in all our secret weapons within the automation of a lot of our day by day duties, from marketing campaign and adverts creation to content material analysis, and extra.

Moreover, the perform mixed with different formulation and add-ons can be utilized for extra superior duties that in any other case would require subtle options and improvement, similar to instruments inbuilt Python.

But on this occasion, we are going to have a look at IMPORTXML in its most simple kind: scraping information from an internet web page.

Let’s take a look at a sensible instance.

Imagine that we’ve been requested to create a marketing campaign for Search Engine Journal.

They would love us to promote the final 30 articles which have been revealed beneath the PPC part of the web site.

Continue Reading Below

A fairly easy activity, you would possibly say.

Unfortunately, the editors aren’t ready to ship us the info and have kindly requested us to refer to the web site to supply the knowledge required to arrange the marketing campaign.

As talked about originally of our article, a method to do that can be to open two browser home windows — one with the web site, and the opposite with Google Sheets or Excel. We would then begin copying and pasting the knowledge over, article by article, and hyperlink by hyperlink.

But utilizing IMPORTXML in Google Sheets, we are able to obtain the identical output with little to no danger of creating errors, in a fraction of the time.

Here’s how.

Step 1: Start with a Fresh Google Sheet

First, we open a brand new, clean Google Sheets doc:

Step 2: Add the Content You Need to Scrape

Add the URL of the web page (or pages) we would like to scrape the knowledge from.

Continue Reading Below

In our case, we begin with https://www.searchenginejournal.com/category/pay-per-click/:

Step 3: Find the XPath

We discover the XPath of the aspect we would like to import the content material of into our information spreadsheet.

In our instance, let’s begin with the titles of the newest 30 articles.

Head to Chrome. Once hovering over the title of one of many articles, right-click and choose Inspect.

This will open the Chrome Dev Tools window:

Make positive that the article title remains to be chosen and highlighted, then right-click once more and select Copy > Copy XPath.

Continue Reading Below

Step 4: Extract the Data Into Google Sheets

Back in your Google Sheets doc, introduce the IMPORTXML perform as follows:

=IMPORTXML(B1,”//*[starts-with(@id, ‘title’)]”)

A few issues to word:

First, in our formulation, we’ve got changed the URL of the web page with the reference to the cell the place the URL is saved (B1).

Second, when copying the XPath from Chrome, it will at all times be enclosed in double-quotes.

(//*[@id=”title_1″])

However, so as to be sure that it doesn’t break the formulation, the double quotes signal will want to be modified to the one quote signal.

(//*[@id=’title_1’])

Note that on this occasion, as a result of the web page ID title modifications for every article (title_1, title_2, and so forth), we should barely modify the question and use “starts-with” so as to seize all parts on the web page with an ID that incorporates ‘title.’

Here is what that appears on the Google Sheets doc:

And in only a few moments, that is what the outcomes seem like after the question has been loaded the info onto the spreadsheet:

As you’ll be able to see, the listing returns all articles which can be featured on the web page that we’ve got simply scraped (together with my earlier piece about automation and the way to use Ad Customizers to Improve Google Ads campaign performance).

Continue Reading Below

You can apply this to scraping every other piece of knowledge want to arrange your advert marketing campaign, as properly.

Let’s add the touchdown web page URLs, the featured snippet of every article, and the identify of the creator into our Sheets doc.

For the touchdown web page URLs, we’d like to tweak the question to specify that we’re after the HREF aspect hooked up to the article title.

Therefore, our question will seem like this:

=IMPORTXML(B1,”//*[starts-with(@id, ‘title’)]/@href”)

Now, append ‘/@href’ to the tip of the Xpath.

Voila! Straight away, we’ve got the URLs of the touchdown pages:

You can do the identical for the featured snippets and creator names:

Troubleshooting

One factor to watch out for is that so as to have the option to absolutely broaden and fill within the spreadsheet with all information returned by the question, the column through which the info is populated will need to have sufficient cells free and no different information in the way in which.

Continue Reading Below

This works in an analogous means to once we use an ARRAYFORMULA, for the formulation to broaden there have to be no different information in the identical column.

Conclusion

And there you will have a completely automated, error-free, means to scrape information from (probably) any webpage, whether or not you want the content material and product descriptions, or ecommerce information similar to product value or transport prices.

In a time when data and information could be the benefit required to ship higher than common outcomes, the flexibility to scrape net pages and structured content material in a straightforward and fast means could be priceless. Besides, as we’ve got seen above, IMPORTXML may help to minimize execution occasions and cut back the probabilities of making errors.

Additionally, the perform isn’t just a terrific device that may be solely used for PPC tasks, however as an alternative could be actually helpful throughout many various initiatives that require net scraping, together with web optimization and content material duties.

More Resources:

Continue Reading Below

Image Credits

All screenshots taken by creator, August 2021

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Pages

Categories

How to Use Google Sheets for Web Scraping & Campaign Building

What is IMPORTXML?

How Can IMPORTXML Help Scrape Elements of a Webpage?

See IMPORTXML in Action

Step 1: Start with a Fresh Google Sheet

Step 2: Add the Content You Need to Scrape

Step 3: Find the XPath

Step 4: Extract the Data Into Google Sheets

Troubleshooting

Conclusion

Leave a Reply Cancel reply

What is IMPORTXML?

How Can IMPORTXML Help Scrape Elements of a Webpage?

See IMPORTXML in Action

Step 1: Start with a Fresh Google Sheet

Step 2: Add the Content You Need to Scrape

Step 3: Find the XPath

Step 4: Extract the Data Into Google Sheets

Troubleshooting

Conclusion

Recommended For You

What is PPC? Complete Guide to Pay-Per-Click Marketing (2024)

Google AdSense launches new Ad Intents format for Auto ads

Anu Adegbola, Author at Search Engine Land

Top 10 PPC expert columns of 2023 on Search Engine Land

Leave a Reply Cancel reply