- What about this? -

Free Download WordPress Themes
Download WordPress Themes Free
Download WordPress Themes Free
Download Nulled WordPress Themes
udemy paid course free download
download xiomi firmware
Download WordPress Themes Free
lynda course free download

Utilizing Python to get well search engine marketing web site visitors (Half one)


Serving to a shopper get well from a foul redesign or web site migration might be one of the vital crucial jobs you possibly can face as an search engine marketing.

The standard strategy of conducting a full forensic search engine marketing audit works nicely more often than not, however what if there was a method to velocity issues up? You can doubtlessly save your shopper some huge cash in alternative value.

Final November, I spoke at TechSEO Increase and introduced a method my staff and I usually use to research visitors drops. It permits us to pinpoint this painful drawback rapidly and with surgical precision. So far as I do know, there are not any instruments that at present implement this method. I coded this answer utilizing Python.

That is the primary a part of a three-part collection. Partly two, we are going to manually group the pages utilizing common expressions and partly three we are going to group them robotically utilizing machine studying methods. Let’s stroll over half one and have some enjoyable!

Winners vs losers

SEO traffic after a switch to shopify, traffic takes a hit

Final June we signed up a shopper that moved from Ecommerce V3 to Shopify and the search engine marketing visitors took an enormous hit. The proprietor arrange 301 redirects between the outdated and new websites however made various unwise adjustments like merging numerous classes and rewriting titles throughout the transfer.

When visitors drops, some elements of the positioning underperform whereas others don’t. I wish to isolate them as a way to 1) focus all efforts on the underperforming elements, and a couple of) be taught from the elements which can be doing nicely.

I name this evaluation the “Winners vs Losers” evaluation. Right here, winners are the elements that do nicely, and losers those that do badly.

visual analysis of winners and losers to figure out why traffic changed

A visualization of the evaluation seems to be just like the chart above. I used to be in a position to slim down the difficulty to the class pages (Assortment pages) and located that the primary concern was brought on by the positioning proprietor merging and eliminating too many classes throughout the transfer.

Let’s stroll over the steps to place this sort of evaluation collectively in Python.

You possibly can reference my fastidiously documented Google Colab pocket book right here.

Getting the info

We wish to programmatically evaluate two separate time frames in Google Analytics (earlier than and after the visitors drop), and we’re going to make use of the Google Analytics API to do it.

Google Analytics Question Explorer gives the only strategy to do that in Python.

  1. Head on over to the Google Analytics Question Explorer
  2. Click on on the button on the high that claims “Click here to Authorize” and observe the steps offered.
  3. Use the dropdown menu to pick the web site you wish to get information from.
  4. Fill within the “metrics” parameter with “ga:newUsers” as a way to monitor new visits.
  5. Full the “dimensions” parameter with “ga:landingPagePath” as a way to get the web page URLs.
  6. Fill within the “segment” parameter with “gaid::-5” as a way to monitor natural search visits.
  7. Hit “Run Query” and let it run
  8. Scroll right down to the underside of the web page and search for the textual content field that claims “API Query URI.”
    1. Examine the field beneath it that claims “Include current access_token in the Query URI (will expire in ~60 minutes).”
    2. On the finish of the URL within the textual content field you must now see access_token=string-of-text-here. You’ll use this string of textual content within the code snippet beneath as  the variable known as token (make certain to stick it contained in the quotes)
  9. Now, scroll again as much as the place we constructed the question, and search for the parameter that was stuffed in for you known as “ids.” You’ll use this within the code snippet beneath because the variable known as “gaid.” Once more, it ought to go contained in the quotes.
  10. Run the cell when you’ve stuffed within the gaid and token variables to instantiate them, and we’re good to go!

First, let’s outline placeholder variables to cross to the API

metrics = “,”.be a part of([“ga:users”,”ga:newUsers”])

dimensions = “,”.be a part of([“ga:landingPagePath”, “ga:date”])

section = “gaid::-5”

# Required, please fill in with your individual GA data instance: ga:23322342

gaid = “ga:23322342”

# Instance: string-of-text-here from step 8.2

token = “”

# Instance https://www.instance.com or http://instance.org

base_site_url = “”

# You possibly can change the beginning and finish dates as you want

begin = “2017-06-01”

finish = “2018-06-30”

The primary operate combines the placeholder variables we stuffed in above with an API URL to get Google Analytics information. We make extra API requests and merge them in case the outcomes exceed the 10,000 restrict.

def GAData(gaid, begin, finish, metrics, dimensions, 

           section, token, max_results=10000):

  “””Creates a generator that yields GA API information 

     in chunks of measurement `max_results`”””

  #construct uri w/ params

  api_uri = “https://www.googleapis.com/analytics/v3/data/ga?ids={gaid}&”




  # insert uri params

  api_uri = api_uri.format(










  # Utilizing yield to make a generator in an

  # try and be reminiscence environment friendly, since information is downloaded in chunks

  r = requests.get(api_uri)

  information = r.json()

  yield information

  if information.get(“nextLink”, None):

    whereas information.get(“nextLink”):

      new_uri = information.get(“nextLink”)

      new_uri += “&access_token={token}”.format(token=token)

      r = requests.get(new_uri)

      information = r.json()

      yield information

Within the second operate, we load the Google Analytics Question Explorer API response right into a pandas DataFrame to simplify our evaluation.

import pandas as pd

def to_df(gadata):

  “””Takes in a generator from GAData() 

     creates a dataframe from the rows”””

  df = None

  for information in gadata:

    if df is None:

      df = pd.DataFrame(


          columns=[x[‘name’] for x in information[‘columnHeaders’]]



      newdf = pd.DataFrame(


          columns=[x[‘name’] for x in information[‘columnHeaders’]]


      df = df.append(newdf)

    print(“Gathered {} rows”.format(len(df)))

  return df

Now, we are able to name the features to load the Google Analytics information.

information = GAData(gaid=gaid, metrics=metrics, begin=begin, 

                finish=finish, dimensions=dimensions, section=section, 


information = to_df(information)

Analyzing the info

Let’s begin by simply getting a take a look at the info. We’ll use the .head() methodology of DataFrames to try the primary few rows. Consider this as glancing at solely the highest few rows of an Excel spreadsheet.


This shows the primary 5 rows of the info body.

Many of the information shouldn’t be in the fitting format for correct evaluation, so let’s carry out some information transformations.

First, let’s convert the date to a datetime object and the metrics to numeric values.

information[‘ga:date’] = pd.to_datetime(information[‘ga:date’])

information[‘ga:users’] = pd.to_numeric(information[‘ga:users’])

information[‘ga:newUsers’] = pd.to_numeric(information[‘ga:newUsers’])

Subsequent, we are going to want the touchdown web page URL, that are relative and embrace URL parameters in two extra codecs: 1) as absolute urls, and a couple of) as relative paths (with out the URL parameters).

from urllib.parse import urlparse, urljoin

information[‘path’] = information[‘ga:landingPagePath’].apply(lambda x: urlparse(x).path)

information[‘url’] = urljoin(base_site_url, information[‘path’])

Now the enjoyable half begins.

The purpose of our evaluation is to see which pages misplaced visitors after a selected date–in comparison with the interval earlier than that date–and which gained visitors after that date.

The instance date chosen beneath corresponds to the precise midpoint of our begin and finish variables used above to collect the info, in order that the info each earlier than and after the date is equally sized.

We start the evaluation by grouping every URL collectively by their path and including up the newUsers for every URL. We do that with the built-in pandas methodology: .groupby(), which takes a column title as an enter and teams collectively every distinctive worth in that column.

The .sum() methodology then takes the sum of each different column within the information body inside every group.

For extra data on these strategies please see the Pandas documentation for groupby.

For many who is perhaps acquainted with SQL, that is analogous to a GROUP BY clause with a SUM within the choose clause

# Change this relying in your wants

MIDPOINT_DATE = “2017-12-15”

earlier than = information[information[‘ga:date’] < pd.to_datetime(MIDPOINT_DATE)]

after = information[information[‘ga:date’] >= pd.to_datetime(MIDPOINT_DATE)]

# Visitors totals earlier than Shopify change

totals_before = earlier than[[“ga:landingPagePath”, “ga:newUsers”]]


totals_before = totals_before.reset_index()

                .sort_values(“ga:newUsers”, ascending=False)

# Visitors totals after Shopify change

totals_after = after[[“ga:landingPagePath”, “ga:newUsers”]]


totals_after = totals_after.reset_index()

               .sort_values(“ga:newUsers”, ascending=False)

You possibly can test the totals earlier than and after with this code and double test with the Google Analytics numbers.

print(“Visitors Totals Earlier than: “)

print(“Row rely: “, len(totals_before))

print(“Visitors Totals After: “)

print(“Row rely: “, len(totals_after))

Subsequent up we merge the 2 information frames, in order that we now have a single column similar to the URL, and two columns similar to the totals earlier than and after the date.

We now have totally different choices when merging as illustrated above. Right here, we use an “outer” merge, as a result of even when a URL didn’t present up within the “before” interval, we nonetheless need it to be part of this merged dataframe. We’ll fill within the blanks with zeros after the merge.

# Evaluating pages from earlier than and after the change

change = totals_after.merge(totals_before, 



                            suffixes=[“_after”, “_before”], 


change.fillna(0, inplace=True)

Distinction and proportion change

Pandas dataframes make easy calculations on entire columns straightforward. We are able to take the distinction of two columns and divide two columns and it’ll carry out that operation on each row for us. We are going to take the distinction of the 2 totals columns, and divide by the “before” column to get the % change earlier than and after out midpoint date.

Utilizing this percent_change column we are able to then filter our dataframe to get the winners, the losers and people URLs with no change.

change[‘difference’] = change[‘ga:newUsers_after’] – change[‘ga:newUsers_before’]

change[‘percent_change’] = change[‘difference’] / change[‘ga:newUsers_before’]

winners = change[change[‘percent_change’] > 0]

losers = change[change[‘percent_change’] < 0]

no_change = change[change[‘percent_change’] == 0]

Sanity test

Lastly, we do a fast sanity test to be sure that all of the visitors from the unique information body continues to be accounted for in any case of our evaluation. To do that, we merely take the sum of all visitors for each the unique information body and the 2 columns of our change dataframe.

# Checking that the entire visitors provides up

information[‘ga:newUsers’].sum() == change[[‘ga:newUsers_after’, ‘ga:newUsers_before’]].sum().sum()

It must be True.


Sorting by the distinction in our losers information body, and taking the .head(10), we are able to see the highest 10 losers in our evaluation. In different phrases, these pages misplaced probably the most complete visitors between the 2 durations earlier than and after the midpoint date.


You are able to do the identical to evaluate the winners and attempt to be taught from them.

winners.sort_values(“difference”, ascending=False).head(10)

You possibly can export the dropping pages to a CSV or Excel utilizing this.


This looks like lots of work to research only one web site–and it’s!

The magic occurs once you reuse this code on new purchasers and easily want to switch the placeholder variables on the high of the script.

Partly two, we are going to make the output extra helpful by grouping the dropping (and successful) pages by their varieties to get the chart I included above.

Associated studying

SEO tips tools guides 2018
guide to google analytics terms
SEO travel mistakes to avoid in 2019

- What about this? -

Leave A Reply

Your email address will not be published.