Skip to main content
Category

SEO

Pagination vs. Incremental Page Loading For Search Engine Crawlability

By Technical SEO No Comments

In this episode of SEO With Chris, I’m talking about pagination vs. incremental page loading. Specifically, how incremental page loading (load more and infinite scroll) pose crawlability issues that pagination does not.

In order for a website’s pages to show up in search results search engines have to be able to find them and that’s where pagination and incremental page loading come into play.

(This content is available in video form, or in a slightly modified text form below the video).

What Is Pagination?

When an e-commerce page, blog page or resources section has too many products/blog posts/articles to show them all on the first page, the page will usually display 10 or 20 or 30 and then you’ll click over to page two in order to see the next batch.

Here’s an example of what pagination looks like. The hiking shirts category page from REI, by default, shows 30 items per page. To see more, you can scroll down to the bottom, and click to page two to see 30 more shirts, and so on and so forth until you reach page 8. Each time you click to another page, the URL will change to /page=2, /page=3, etc.

screenshot of traditional pagination from REI

Historically all websites used this method of making website content available. It’s very easy to implement, it’s easy for humans to use and it’s really good for search engines as well. Search engines have no problem crawling traditional forms of pagination.

Incremental Page Loading

As technology has marched on, we saw the introduction of incremental page loading, which comes in two forms: Load more buttons and infinite scroll.

Load More Buttons

The first type of incremental page loading is the load more or the show more button. In this configuration, you’ll see a default number of products/articles, and a button that you have to click in order to see more. The button usually says “Show More” or “Load More”, but it can be labeled anything. Once you click the button, more articles are dynamically loaded with JavaScript.

Here’s an example from NBCPhiladelphia.com. They have a tag page for the Philadelphia Phillies, which displays 20 articles above a “Show More” button. Clicking “Show More” triggers JavaScript to dynamically load another 20 articles. Those articles are not present in the HTML until the Show More button is clicked. This is problematic for search engines, and I’ll show you why in a minute.

show more incremental page loading example from NBC Philadephia

At the time of this publishing, the next article after the “Show More” button is clicked is about a man who met his kidney donor at a Phillies game. But when we inspect the page, and search for it, that link doesn’t appear…until after you click “Show More.” Without additional configuration, search engines cannot find that link.

Infinite Scroll

The second type of incremental page loading is called infinite scroll. Infinite scroll is similar to a load more button, except that you don’t have to click a button, and more content is loaded automatically as you scroll. This configuration also uses JavaScript to dynamically load content which is not present in the HTML prior to scrolling.

Infinite scroll is challenging to show in a screenshot, so check out this example to see what I mean: https://htmx.org/examples/infinite-scroll/

Problems For Search Engines

Both methods of incremental page loading cause issues for search engines, and prevent them from comprehensively crawling the site. Here’s what Google says about crawling load more and infinite scroll:

“Once you’ve selected the most appropriate UX strategy for your site and SEO, make sure the Google crawler can find all of your content.

For example, you can implement pagination using links to new pages on your ecommerce site, or using JavaScript to update the current page. Load more and infinite scroll are generally implemented using JavaScript. When crawling a site to find pages to index, Google only follows page links marked up in HTML with <a href> tags. The Google crawler doesn’t follow buttons (unless marked up with <a href>) and doesn’t trigger JavaScript to update the current page contents.”

These two technologies that I just discussed are problematic because Google clearly states that they will not execute the JavaScript needed in order to crawl them.

Finding Crawlability Issues From Incremental Page Loading

We know that pagination is crawlable. Load more, show more, etc. are not crawlable. Infinite scroll is also not crawlable. Here’s how to check for them…

One thing is you can crawl the site with Screaming Frog, Sitebulb or another crawler and look for pages with URLs that have ?page=2, ?page=3, etc. in the URL. That would be an indicator that pagination is present and you may not have an issue. If you crawl the site and don’t find those pages, it may be an indicator that they have a load more or an infinite scroll setup.

If it’s a small site and you know it has e-commerce pages or blog/resources pages, you can go look at those manually and see if incremental loading is present.

Another thing you can do is crawl the site normally and then also crawl the XML sitemap. If the URL counts are off between the sitemap crawl and the regular crawl then that might be an indicator that you have pages that are present in the site map but can’t get discovered because of incremental page loading. (You might also find that you have island pages, which could be a separate issue).

Fixing Crawlability Issues From Incremental Page Loading

There are a few ways to effectively fix crawlability issues created by incremental page loading.

Change To Traditional Pagination

One option is you can implement a traditional style pagination setup. It’s time tested, we know it works, it’s pretty lightweight and it’s crawlable. I’m not a UX person so I can’t speak to the user experience impact of traditional pagination vs. incremental loading, but I do know that traditional pagination is crawlable.

Hidden Pagination

The other option is to implement hidden pagination on pages with an incremental scroll configuration. It’s just a version of pagination that search engines can see but visitors can’t because it’s hidden in the HTML.

Remember the NBC Philadelphia page from earlier? Here’s the hidden pagination they’re using on their incremental loading pages:

screenshot of hidden pagination nbc philadelphia

Hiding Content With CSS 

The other option is loading all of the content in the HTML and then using CSS to hide it underneath a load more/show more button. In that situation, clicking the button simply unhides the content – you’re not dynamically loading it like you would with JavaScript. Search engines can crawl it because they don’t need to click the button – it’s right there in the HTML the whole time.

Wrap-Up

Whereas traditional pagination is crawlable, incremental loading by default is not, and to fix it you need to implement one of several different configurations. Questions? Comment here or find me on Twitter/X.

 

Need SEO help? Contact me!





    Virtual Event Schema Sample

    By Technical SEO No Comments

    This is a sample of Event Schema for Virtual Events. The goal is to craft Schema that *may* associate the Virtual Event with an Organization’s Google My Business listing.

    <script type="application/ld+json">
    {"@context" : "http://schema.org",
    "@type" : "Event",
    "name" : "Indy Hall Event Number One",
    "description" : "This is the first Indy Hall virtual event.",
    "url" : "https://www.indyhall.org/events/number-one/",
    "image" : "https://www.indyhall.org/media/events/number-one.jpg",
    "startDate": "2020-04-23T14:00-4:00",
    "endDate": "2020-04-23T15:00-4:00",
    "eventAttendanceMode": "https://schema.org/OnlineEventAttendanceMode",
    "eventStatus": "https://schema.org/EventScheduled",
    "performer" : "Alex Hillman",
    "offers": {
        "@type": "Offer",
        "price": "0.00",
        "priceCurrency": "USD",
    	"availability" : "InStock",
    	"url" : "https://www.indyhall.org/tickets/",
    	"validFrom" : "2020-04-23T12:00-5:00"},
    "organizer" :
    {"@type" : "Organization",
    "name" : "Indy Hall",
    "url" : "https://www.indyhall.org/",
    "address": 
    {"@type" : "PostalAddress",
    "streetAddress" : "399 Market Street Suite 360",
    "addressLocality" : "Philadelphia",
    "addressRegion" : "PA",
    "postalCode" : "19106"}},
    "location" : 
    {"@type" : "VirtualLocation",
    "url" : "https://www.indyhall.org/zoom/"},
    "sameAs" : ["https://meetingplace.io/indyhall/events/0001"]}
    </script>

     

    How Does Google Treat Subdomains For SEO?

    By SEO, Technical SEO 14 Comments

    Time and time again, Google has shown that they treat subdomains very differently from root domains, in some cases treating them as completely different sites. For SEO purposes, it’s generally recommended to use a subfolder instead of a subdomain.

    Subdomain vs. Subfolder

    A subdomain is a string of characters that precedes the root domain and uses a period to separate them. A subfolder comes after the domain suffix and is separated by a forward slash. You can have multiple subdomains or subfolders, and you’ll frequently see them combined.

    Examples:

    • Blog.chrisberkley.com is a subdomain
    • Chrisberkley.com/posts/ is a subfolder
    • Blog.chrisberkley.com/posts/ is a subdomain with a subfolder.
    • First.blog.chrisberkley.com is two subdomains (“first” and “blog”)
    • First.blog.chrisberkley.com/posts/recent/ is two subdomains (“first” and “blog”) with two subfolders (“posts” and “recent”).

    Did You Know?

    In the URL www.chrisberkley.com, “www” is technically a subdomain. It’s true!

    Why Use Subdomains?

    There are legitimate reasons that necessitate the use of subdomains and subdomains are not completely unavoidable.

    Technical Limitations

    Sometimes there are technical infrastructure limitations that prevent the use of a subdomain. In large organizations with big sites, it’s common for access to the root domain to be limited, instead using subdomains for ease of use.

    This may include piecing together multiple CMSs. If the core site is hosted on one CMS like Magento or Sitecore, but the blog is hosted on WordPress, it can be difficult (or impossible) to make them work together on the root domain.

    Organizational Control

    Large organizations often have multiple divisions that operate independently. Such is the case with universities, where individual colleges need to have edit access to their own sites (School of Nursing, School of Engineering, etc.). The same is true for other national organizations like banking institutions.

    It’s a lot easier to spool up a separate site on a subdomain and grant a team of people edit access to that particular subdomain. You wouldn’t want the School of Nursing making edits that ended up taking down the entire root domain for the whole college.

    International

    Sometimes organizations will create international subdomains like fr.chrisberkley.com or en.chrisberkley.com. There’s no inherent SEO benefit to including a country code in the subdomain, but it may comeback to organizational structure or technical limitations. In a perfect world, you’d place those in subfolders (chrisberkley.com/fr/ or chrisberkley.com/en/) and implement hreflang. Alas, we  don’t live in a vacuum and that isn’t always possible.

    How Google Treats Subdomains

    Working with subdomain-heavy clients, my firsthand experience is that Google treats subdomains as separate sites. A client of mine who had two divisions of their company had one set up on subdomain and another on the root domain. They had some content overlap and we sometimes saw their pages swap places in search results.

    It’s my belief that subdomains don’t inherit domain authority or site equity from the root domain. WordPress.com has a domain authority of 94. If subdomains inherited that value, wouldn’t it make sense to setup free blogs on their platform (which uses subdomains) and immediately benefit from the SEO value?

    Secondly, Google’s own Search Console requires you to set up separate profiles for subdomains. That’s another good indicator that they value subdomains differently.

    That doesn’t mean subdomains inherit ZERO equity from their root domains. They may inherit a greatly reduced amount. OR, Google may adjust the amount of equity they inherit on a case-by-case basis. Since WordPress.com has thousands of low-authority blogs on subdomains, Google may devalue their subdomains more than other sites that only have a handful.

    Google has stated that their search engine is indifferent to subdomains vs. subfolders, but the SEO community has repeatedly found that to be false. Industry thought-leader Moz moved their content from a subdomain to a subfolder and saw measurable increases just as a result of that move.

    Questions? Comments? Leave them here or reach out to me on Twitter: @BerkleyBikes.

    How Long For Content To Rank?

    By Content Marketing, SEO One Comment

    The number one struggle I face with pitching clients and showing them the value of SEO, is that it takes time. Whereas pay-per-click advertising and social media can be spun up relatively quickly and provide a return on investment rather quickly, SEO is an annuity investment.

    To make a relevant analogy: you can’t invest money in the stock market today and expect dividends tomorrow. The money you invest today is done so with the understanding that it will provide value later. SEO is similar.

    Nevertheless, that’s a real problem because when clients are making a significant investment in SEO, they want to see results. That’s why I prepare clients by telling them “some of the work we do isn’t going to yield results right away. It’s going to take 6-12 months.”

    This is especially true with publishing new content. Ahrefs did a study about how long it takes to rank in Google. They looked at the average age of pages ranking in positions 1-10, and the overall takeaway was that higher positions typically featured pages that have been live for several years. They also noted that higher authority sites took less time to rank well, which is a no-brainer. If there’s one single graph that shows their findings best, it’s this one:

    That’s helpful, but does their large scale study align with actual first hand findings? Sure there’s value in a larger data sample, but having actual anecdotal data would certainly help reinforce those findings.

    Fortunately I have that data. Across multiple clients in multiple industries, I can highlight examples of pages that rank well for target keywords, but didn’t reach full potential until months after they were published. I’m sharing these examples so that both consultants and clients can form realistic expectations for SEO campaigns, which is something I believe this industry can and should do a much better job at.

    Example #1

    Client Industry: Construction

    Type of page: WordPress blog post

    This particular page targeted “rental cost” keywords which are fairly low volume but highly relevant in the client’s industry. The client was hesitant to discuss pricing, but competitors were doing it, so we pushed them to create their own page. Not only does it drive meaningful traffic, but it has resulted in ~3 leads per month since it was published 16 months ago.

    Example #2

    Client Industry: Web hosting

    Type of page: Resource center pages

    These two pages were both created as part of a large content initiative – more than 120 pages of long form content over a one year period. Notably, they both saw steady growth and then marked increases in January 2018, possibly as a result of an algorithm update.

     

    Example #3

    Client Industry: Healthcare

    Type of page: Core site page

    This page saw long periods of inactivity in the very competitive healthcare space, before eventually moving into ranking positions that drive meaningful amounts of traffic (this is also a result of other improvements made to the site during that time).

    Example #4

    Client Industry: Local retail

    Type of page: WordPress blog post

    This example comes from a mom & pop retail store. A blog post that I wrote eventually moved into top ranking positions for some industry head terms, outranking even the brands that the retailer sold in their store. Unfortunately, the business owners did not continue digital marketing efforts after I left my position there, and the content did not retain its visibility in search results.

    Example #5

    Client Industry: Digital marketing

    Type of page: WordPress blog post

    The last example comes from my own website (which has lower site authority than any of my clients). While not initially a large traffic source, an analytics blog post I wrote moved into top positions (including the answer box) over a period of one year.

    Summary

    The key takeaway here is that firsthand data supports the study that Ahrefs did – that content may take months or more to move into top ranking positions, especially for competitive keywords. Site authority absolutely helps – two of the sites included here had domain authority ratings between 50 and 80, which is a rough indicator that they’re authoritative, especially in their respective industries.

    With some of the examples, we did employ other tactics like building internal and external links. All pages were submitted to Google Search Console after publishing to make sure they got crawled as soon as possible. Also obvious is the fact that none of these pages were in a vacuum meaning that there were other marketing (and SEO) initiatives that could’ve contributed to better rankings. Nevertheless, there is a clear pattern showing that even highly optimized content on authoritative sites doesn’t always achieve top rankings immediately, and SEO continues to require patience.

    How To: Optimize WordPress Posts & Pages For SEO

    By Content Marketing, SEO One Comment

    WordPress is a brilliant CMS that offers a plethora of SEO functionality out-of-the-box. But like any piece of technology, default settings won’t be enough to truly maximize its potential.This post will show you how to optimize a WordPress post (or page) for SEO purposes.

    The WordPress SEO Plugins

    While WordPress is good out of the box, it needs an SEO plugin to take it to the next level. The gold standards are either Joost de Valk’s Yoast SEO Plugin or All In One SEO Pack by Michael Torbert. Both add critical functionality for SEO purposes, so make sure you have one installed.

    Content

    No amount of optimization will help if you’re targeting topics with low or non-existent search volume. The same can be said for high volume (and high competition) topics. You have to pick topics and themes that are realistic and within your wheelhouse to achieve SEO success.

    First we’ll start with the post content itself, focusing on how to structure the page with H headings and overall content length.

    H Headings & Page Structure

    Start by adding a post title. In many WordPress themes, the post title will also be present on the page as an H1 heading. Pages should only have one H1 heading and it needs to be keyword-rich and descriptive of the post’s content. The H1 is the first text a visitor sees when they hit the page.

    In addition to H1 headings, it’s increasingly important to structure pages with additional, nested H headings like H2s, H3s, H4s, etc. These should also be keyword-rich and describe the subsequent paragraph. On this very page you’ll see a clear structure where paragraphs are ordered and grouped by similarity and marked up with a clear hierarchy of H headings.

    If you know your subject matter and audience well, developing a hierarchy of H headings may be second nature to you. If not, performing keyword research can typically reveal different subtopics and then you can apply common sense to order them in the method that makes the most sense for visitors.

    Ordered and Unordered Lists (Bullet Points)

    To break up content and make it more digestible, use ordered lists (numbered lists) and unordered lists (bullet points) where applicable. Using these with a keyword-rich H heading may result in securing a featured snippet (answer box) in search results.

    • Anytime you’re describing steps, consider using an ordered list.
    • If you’re listing several things using commas, try bullet points instead.

    This is not only helpful for SEO, it helps readers digest a page more easily.

    Content Length

    Content length is much debated and the honest answer to “what’s the right length” is that there isn’t one. If the content is engaging, people will read it. Know your audience, write quality content and you’ll succeed.

    With that being said, 250-300 words is commonly considered the absolute minimum for SEO purposes. Less than that and search engines may deem the content thin. It will be incredibly difficult to add a meaningful structure of H headings to a page with 300 words.

    I recommend content that’s a minimum of 500-700 words. In many cases, long form content can do wonders for SEO and when I say long form I mean 1,000 words or more. Most of my successful posts are detailed how-tos in excess of 1,000 words. Your mileage may vary – put your focus on writing good content and worry less about the length.

    Video, Images & Media

    Video, images and media are also great ways to break up text-based content and provide additional value for visitors. Would the topic you’re discussing be more easily understood if a visual were added? In many cases, yes.

    Here I’ll discuss ways to optimize media for SEO, and also for visitors with disabilities or impairments, who may not be able to consume images, video or audio.

    Image Optimization

    Images can be improved for SEO by using filenames, alt text and by optimizing image sizes (for site speed). Because search engines can’t visually determine the contents of an image, these optimizations allow them to understand image content, helping the page rank better and helping images to rank in image search results. Additionally, visitors with visual impairments may not be able to see images, so these optimizations help them consume and understand multimedia content.

    Image Filenames

    Including keywords in filenames can have impact. It’s not huge, but every bit helps. Use descriptive keywords in filenames when possible but don’t start keyword stuffing – make them descriptive and methodical.

    Image Alt Text

    Include image alt text when possible. The alt text is never seen by visitors unless A) the image doesn’t load or B) the visitor is impaired and the alt text is read to them.

    Both of these scenarios help visitors understand the content of the image, even if it can’t be seen. For that reason, make your image alt text descriptive of what’s in the image and avoid keyword stuffing.

    wordpress seo image alt text

    The alt text for the image immediately above: wordpress seo image alt text

    Image Size Optimization

    Your images should only be as large as they need to be. Often, GIANT images are scaled down to a much smaller size with HTML. The problem is, if you have a giant image with an enormous file size, browsers have to load the entire image, even if it’s being displayed at a much smaller size. That slows down page speed, especially if there are multiple large images on the page.

    Make the image as big as it needs to be. If the image will be displayed at 900 pixels wide, then make it 900 pixels wide. Secondly, use JPG images instead of PNGs – JPGs are significantly smaller in file size. If you don’t have an image editing program, you can do it right in WordPress from the Media Library menu.

    Featured Images

    Add a featured image. The featured image will be used as the default image when a page or post is shared on social media, although this can be changed for different social networks.

    Video

    Similar to images, video content also has opportunities for on-page optimization. Video content is equally hard for search engines to understand, so we optimize by adding context in other ways.

    Embedding

    Embedding video content on WordPress posts or pages is quite easy, especially for YouTube, Wistia and Vimeo. With any of these three, you can simply drop the URL into WordPress’ WYSIWYG editor and it will automatically embed the video. Embedding videos on-site is a great way to get more views and provide a superior user experience.

    Schema

    When you do embed video content, make sure you add Schema as well. If you’re using Wistia, you’re in luck, because Wistia embeds Video Schema by default using Javascript (read more about Wistia videos & schema here).

    YouTube and Vimeo users are not as fortunate however, and must add Schema manually, preferably using custom fields. JSON is Google’s preferred version of Schema and creating the Schema is not difficult at all. Schema gives search engines additional information about videos, such as the video’s title, description, length, upload date, etc. Schema is the only way for search engines to get information about video contents.  

    <script type="application/ld+json">
    {
    "@context": "http://schema.org",
    "@type": "VideoObject",
    "name": "Contact Form 7 Goal Conversion Tracking Google Tag Manager",
    "description": "Follow this 10 minute guide to set up Google Analytics goal conversion tracking for Contact Form 7 submissions using Google Tag Manager.
    If you have a WordPress website and you use the Contact Form 7 plugin, you can use Google Tag Manager to create events and set up Goal Conversions in Google Analytics. Then you can attribute form submissions to different marketing channels and campaigns that you're running.
    This guide not only shows you how to track submissions, but also ensures that you're only tracking successful submissions where mail is actually sent. It also allows you to specify which forms you want to track, based on the form ID built into the Contact Form 7 shortcode. 
    ***Links***
    Written how-to guide: https://chrisberkley.com/blog/contact-form-7-event-tracking-google-tag-manager/
    Javascript code for Tag #1: 
    https://chrisberkley.com/wp-content/uploads/2017/11/wpcf7mailsent-javascript.txt
    Troubleshooting your setup:
    https://chrisberkley.com/blog/troubleshooting-contact-form-tracking-with-gtm/",
    "thumbnailUrl": "https://i.ytimg.com/vi/oTZG7A3RjT8/maxresdefault.jpg",
    "uploadDate": "2017-11-19",
    "duration": "PT10M1S",
    "embedUrl": "https://www.youtube.com/embed/oTZG7A3RjT8"
    }
    </script>

    Transcripts

    Transcripts can be really critical. Not only do they give impaired users a full transcript of the video’s content, but they can be keyword-rich and help a page rank if the video is especially relevant to the target keywords.

    I don’t always include transcripts, but often recommend including them in an accordion drop-down, so as not to disrupt the flow of existing text on the page. If the page doesn’t have much additional text, transcripts can easily be adapted into blog posts.

    Meta Data

    Meta data is still really important for SEO. Both Yoast’s plugin and All In One SEO make it very easy to add a title tag and meta description, even warning you if you approach character limits.

    Title Tags

    Using your chosen SEO plugin, write and add an optimized title tag. Shoot for 45-60 characters. Excessively long titles will be truncated in search results.

    I prefer to include the target keyword at the beginning and then include branding at the end. Title tags should grab the searcher. I’m a fan of using question-based title tags if they’re relevant. Here’s the title tag for this post:

    How To Optimize WordPress Posts & Pages For SEO | Chris Berkley

    Meta Descriptions

    Meta descriptions should be up to 230 characters and describe the page’s contents – be as descriptive as possible. Meta descriptions are the key to encouraging searchers to click through from search results and can have a big impact on click through rates.

    Tell searchers what value the page will provide and what they’ll find. Include branding if possible. End with a CTA telling them what to do once they land on the page.

    Here’s the meta description for this page:

    Optimizing WordPress posts and pages is critical for SEO. Follow this comprehensive guide to make sure your content is FULLY optimized, using all of WordPress’ advanced functionality.

    Social Markup

    SEO plugins make it easy to add Open Graph and Twitter Card markup to the page. These meta tags are specifically for social media and add rich snippets when URLs are included in social posts.

    Even without social markup, most social networks will pull the page title, description and image to create a rich snippet. However, these aren’t always optimal – they frequently pull the wrong or completely irrelevant images. Optimizing this markup allows you to customize titles, descriptions and images for use on social media.

    Open Graph Markup

    Open Graph is a standard markup most notably used by Facebook, LinkedIn and Pinterest. The two SEO plugins I mentioned before automate the creation of Open Graph markup using the title tag, meta description and featured image that you’ve added to the page. However, they also allow you to customize these fields specifically for social media – this is especially easy using the Yoast plugin.

    Say you wanted to add a catchier title/description/image for use on social media. You can do that without impacting your SEO efforts by changing the title tag & meta description that Google uses.

    Twitter Cards

    Rather than use Open Graph markup like most other networks, Twitter elected to create its own (very similar) markup called Twitter Cards. You can customize these too just like Open Graph markup.

    Linking

    Internally linking pages is really important for both visitors (they can find related content) and search engines (can crawl the site more easily). Internal links can be added in a number of ways, but some are more valuable than others.

    Links In The Body Copy

    Body copy links are arguably the most valuable, assuming they’re done naturally and in moderation. No one likes a page where every other sentence is a link – it’s incredibly distracting and results in a poor user experience. Use links where they fit naturally.

    Internal Links

    Internal links (links from one page on your site to another page on your site) are valuable for helping visitors find related content and improving the ability for search engines to crawl the site. I recommend setting these to open in the current tab.

    External Links

    Linking out to other sites is fine too. If there’s a page on another site that would provide value to your visitors, link out to it. I recommend opening these in new tabs, to encourage visitors to stay longer on your site.

    Anchor Text

    Anchor text is the phrase that gets hyperlinked to another page. You should aim to use keyword-optimized anchor text, especially for internal links (keyword-rich anchor text is not as necessary for external links).

    There are several links within this article that link out to other related topics, using anchor text keywords relevant to those topics.

    Categories & Tags

    Use Categories and Tags methodically. Keyword stuffing them has no benefit for SEO purposes. Instead, they should be used to help visitors browse the site to discover related content. Additionally, Categories & Tags have a ton of value for search engines as they make it easy to crawl the site and find additional pages.

    Categories & Tags are the first line of defense against island pages and semi-automate internal linking. However, blog category pages typically contain dynamic content (unless setup otherwise) and typically don’t present much value for ranking purposes.

    Build out some pre-determined Categories & Tags and stick to them, adding new ones as you go. Avoid using the same categories as tags and vice versa. Think of Tags as sub-categories. Below is a sample diagram of a Category-Tag structure.

    Authors

    Adding author details can establish credibility to the post by showing an appeal to authority. WordPress editors have the option of changing the author at the bottom of the post. Don’t ever leave the post author as “Admin.”

    Authors should have photos & biographies describing who they are. There’s no inherent SEO value here (not anymore), but it shows readers who actually wrote the content. I always include links to my Twitter page for people to ask questions about my content.

    Technical

    Schema

    Schema (Structured Data) helps search engines crawl and index web pages by specifying specific pieces of content. There are many types (I won’t describe them all) which can be found at Schema.org.

    A few common types are:

    • Video
    • Product
    • Person
    • Location

    I recommend following Torquemag’s guide to setting up custom WordPress fields for Schema. You can also read more about Schema and Structured Data with Google’s developer documentation.

    URL Structure

    When you save your post or page as a draft, you’ll see that WordPress automatically takes the post title (H1) and also uses it for the URL. In this case, you may choose to edit the URL, but make sure it’s still keyword-rich. You want your most valuable keywords in the URL.

    If you’re creating a page (not a post) you’ll see that you have the option of selecting a parent page. Should you add one? It comes down to site structure and strategy. If the page you’re creating falls naturally as a child page to another page, then take advantage of it.

    Adding a child/parent page isn’t a silver bullet for SEO. It’s part of a bigger SEO strategy centering around how content is structured on your site. If have a careful hierarchy built out, adding URLs that reflect the site structure is icing on the cake.

    The Difference

    Following these steps can be the difference between content that ranks and content that doesn’t. Content has been increasingly important, especially as backlinks have become less influential as a ranking factor.

    Checklist

    If the number of steps seems intimidating, download this checklist and integrate these steps into your content publishing process.

    Download The Checklist

    How To Use IMPORTXML & Google Sheets to Scrape Sites

    By SEO, Technical SEO 6 Comments

    IMPORTXML is a very helpful function that can be used in Google Sheets to effectively crawl and scrape website data in small quantities (especially useful for grabbing titles and meta descriptions, etc.). It can be faster and more convenient that using Screaming Frog or other tools, especially if you only need to pull data for a handful of URLs. This post will show you how to use IMPORTXML with XPath to crawl website data including: metadata, Open Graph markup, Twitter Cards, canonicals and more.

    Skip Ahead: Get the free template.

    Setting Up The IMPORTXML Formula

    This is the IMPORTXML formula:

    =IMPORTXML(url,xpath_query)

    You can see there are two parts and they’re both quite simple:

    The first half of the formula just indicates what URL is going to be crawled. This can be an actual URL – but it’s much easier to reference a cell in the spreadsheet and paste the URL there.

    The second half of the formula is going to use XPath to tell the formula what data is going to be scraped. XPath is essentially a language that is used to identify specific parts of a document (like a webpage). Subsequent paragraphs will provide different XPath formulas for different pieces of information you might want to scrape.

    Crawling Metadata with IMPORTXML

    The following XPath formulas will scrape some of the most commonly desired SEO data like metadata, canonical tags, and H headings. Note that you can scrape any level of H heading by replacing the “h1” with whichever heading you want to scrape (h2, h3, etc.)

    Title Tags: //title/text()
    Meta Descriptions: //meta[@name='description']/@content
    Canonical Tags: //link[@rel='canonical']/@href
    H1 Heading(s): //h1/text()
    H2 Heading(s): //h2/text()
    
    

    Social Markup

    While social markup has no immediate SEO benefit, it is very important for sites that have active audiences on social media, and implementation of social markup often falls under the umbrella of SEO because of its technical nature. The following XPath formulas will allow you to scrape Open Graph and Twitter Card markup.

    Open Graph Markup

    Open Graph is used by Facebook, LinkedIn and Pinterest, so all the more reason to make sure it’s implemented correctly.

    OG Title: //meta[@property='og:title']/@content
    OG Description: //meta[@property='og:description']/@content
    OG Type: //meta[@property='og:type']/@content
    OG URL: //meta[@property='og:url']/@content
    OG Image: //meta[@property='og:image']/@content
    OG Site Name: //meta[@property='og:site_name']/@content
    OG Locale: //meta[@property='og:locale']/@content
    
    

    Twitter Card Data

    Twitter Card markup is only for….Twitter. Still important though!

    Twitter Title: //meta[@name='twitter:title']/@content
    Twitter Description: //meta[@name='twitter:description']/@content
    Twitter Image: //meta[@name='twitter:image']/@content
    Twitter Card Type: //meta[@name='twitter:card']/@content
    Twitter Site: //meta[@name='twitter:site']/@content
    Twitter Creator: //meta[@name='twitter:creator']/@content
    
    

    Limitations

    Unfortunately, IMPORTXML & Sheets cannot be used to scrape large quantities of data at scale, or it will stop functioning. For more than a handful of URLs, it’s recommended to use a more robust program like Screaming Frog (Screaming Frog does not have a URL limit when using it in list mode).

    IMPORTXML Google Sheets Template

    You can see how this works firsthand by making a copy of this Sheets Scraper Template and entering the URL of your choice in cell B6. To add additional URLs, copy & paste row 6, then enter a different URL.

    Questions? Contact me here or reach out on Twitter!

    WWW vs. non-WWW For SEO

    By SEO, Technical SEO No Comments

    There is no SEO benefit to WWW URLs vs non-WWW URLs. Best practice is to pick one as the preferred version and use server-side redirects to ensure all visitors (human and search engine) end up on one single preferred version of the URL.

    What Is WWW?

    First let’s start with URL structure:

    In the URL above, there are three parts:

    • Protocol
    • Subdomain
    • Domain name

    Protocol is a topic for another time, but WWW is technically a subdomain. Websites often use multiple subdomains for different purposes: one for email, one for intranet access, etc. The www subdomain has traditionally been used as the designated subdomain for public-facing websites.

    Which Is Better For SEO?

    As noted, there is no benefit for SEO purposes. You don’t actually need a subdomain. It’s perfectly fine not to use it and there is zero functional difference for SEO purposes. However, you DO need to pick one version and use it consistently.

    Server-Side Redirects

    Once a preferred version has been chosen, the other version needs to be 301-redirected at the server level. If it isn’t, it might result in:

    1. Non-preferred URLs returning 404 errors.
    2. The website rendering pages in both variations.

    Configuring the server to redirect non-preferred versions to preferred versions ensures that ALL URLs will be redirected automatically.

    Configuring Google Search Console

    Additionally, it’s recommended to configure Search Console to indicate the preferred version as well. In the top right corner, click the gear icon and select Site Settings. There you’ll see the option to set a preferred version of the URL:

    What Are XML Sitemaps? How To Use Them for SEO

    By SEO, Technical SEO

    XML Sitemaps are critical to help search engines crawl websites, but I frequently see clients with critical errors in their XML sitemaps. That’s a problem because search engines may ignore sitemaps if they repeatedly encounter URL errors when crawling them.

    What Is An XML Sitemap?

    An XML Sitemap is an XML file that contains a structured list of URLs that helps search engines crawl websites. It’s designed explicitly for search engines – not humans – and acts as a supplement. Whereas web crawlers like Googlebot will crawl sites and follow links to find pages, the XML sitemap can act as a safety net to help Googlebot find pages that aren’t easily accessed by crawling a site (typically called island pages, if there are no links built to them).

    Where Do XML Sitemaps Live?

    The XML sitemap lives in the root folder, immediately after the domain, and often follows a naming convention such as domain.com/sitemap.xml. A Sitemap declaration should also be placed in the robots.txt file so that Google can easily discover it when it crawls the robots.txt file.

    What URLs Should Be Included In An XML Sitemap?

    URLs included in the XML sitemap should be URLs that are intended to be crawled, indexed and ranked in search results. URLs should meet the following specific criteria in order to be included:

    • Only 200 OK URLs: no 404s, 301s, etc.
    • Pages do not contain a noindex tag
    • Pages are not canonicalized elsewhere
    • Pages are not blocked by robots.txt

    HTTP Status Codes

    Sitemap URLs should return clean 200 status codes. That means no 301 or 302 redirects, 404 errors, 410 errors or otherwise. Google won’t index pages that return 404 errors, and if Googlebot does encounter a 301 redirect, it will typically follow it and find the destination URL, then index that.

    If you have 404 errors, first ask why: was a page’s URL changed? If so, consider redirecting that URL by locating the new URL. Take that new URL and make sure that is included in the sitemap.

    If there are 301s or 302s, follow them to the destination URL (which should be a 200) and replace the redirected URL in the sitemap.

    Noindexed & Disallowed Pages

    If a page has a noindex tag, then it’s clearly not intended to be indexed, so it’s a moot point to include it in the XML sitemap. Similarly, if a page is blocked from being crawled with robots.txt, those URLs should not be included either.

    If you DO have noindexed or disallowed pages in your XML sitemap, re-evaluate whether they should be blocked. It may be that you have a rogue robots.txt rule or noindex tags that should be removed.]

    Non-Canonical URLs

    If a page in the sitemap has a canonical tag that points to another page, then remove that URL and replace it with the canonicalized one.

    Does Every Clean 200 Status URL Need To Be Included?

    In short, no. Especially on very large sites, it may make sense to prioritize the most important pages and include those in the XML Sitemap. Lower priority, less important pages may be omitted. Just because a page is not included in the XML sitemap does not mean it won’t get crawled and indexed.

    Sitemap Limits & Index Files

    An XML sitemap can only contain 50,000 URLs or reach a file size of 10MB. Sitemaps that exceed this limit may get partially crawled or ignored completely. If a site has more than 50,000 URLs, you’ll need to create multiple sitemaps.

    These additional sitemaps may be located using a sitemap index file. It’s basically a sitemap that has other sitemaps linked inside it. Instead of including multiple sitemaps in the robots.txt file, only the index file needs to be included.

    If there ARE too many URLs to fit into one sitemap, URLs should be carefully and methodically structured in hierarchical sitemaps. In other words, group site sections or subfolders in the same sitemap so that Google can get a better understanding of how URLs interrelate. Is this required? No, but it makes sense to be strategic.

    Types of XML Sitemaps

    In addition to creating sitemaps for pages, sitemaps can (and should) be created for other media types including images, videos, etc.

    Dynamic vs. Static

    Depending on the CMS and how it’s configured, the sitemap may be dynamic, meaning it will automatically update to include new URLs. If it’s configured correctly, it will exclude all the aforementioned URLs that shouldn’t be included. Unfortunately, dynamic sitemaps do not always operate that way.

    The alternative is a static sitemap, which can easily be created using the Screaming Frog SEO spider. Static sitemaps offer greater control over what URLs are included, but do not automatically update to include new URLs. In some cases I’ve recommended clients utilize static sitemaps if a dynamic sitemap cannot be configured to meet sitemap criteria. When that happens, I set a reminder to provide an updated sitemap, typically on a quarterly basis, or more often if new pages are frequently added to the site.

    Submission to Webmaster Tools

    Once an XML sitemap has been created and uploaded, it should always be submitted to Google Search Console and Bing Webmaster Tools to ensure crawlers can access it (in addition to the robots.txt declaration).

    In Google Search Console

    Navigate to Crawl > Sitemaps and at the top right you’ll see an option to Add/Test Sitemap. Click that and you can submit your sitemap’s URL to be crawled.

    In Bing Webmaster Tools

    From the main dashboard, navigate down to the sitemaps section and click “Submit a Sitemap” at the bottom right. There you can enter your sitemap’s URL.

    Finding Pages With Embedded Wistia Videos

    By Technical SEO, Video No Comments

    Wistia is a great platform for hosting videos on your site with tons of functionality including the ability to embed videos on pages and optimize them using built-in calls-to-action and pop-ups.

    Recently I encountered a scenario where I wanted to find every website page that had a Wistia video on it. Going into Wistia’s back end revealed that the client had ~200 videos, but I had no idea where they were actually placed on the site, and wanted to ensure they were being used to full capacity.

    With YouTube, you can simply run a Screaming Frog crawl and do a custom extraction to pull out all the embed URLs. From there you can determine which video is embedded based on that URL. However, the way Wistia embeds videos is not conducive to identifying which video is where, based on an embed URL. I couldn’t find any distinguishing characteristics that would help me identify which video was which.

    How can such an advanced video platform be so incredibly difficult?

    That’s mostly because Wistia relies heavily on Javascript. As Mike King notes in his article The Technical SEO Renaissance, right clicking a page and selecting “view page source” won’t work because you’re not looking at a computed Document Object Model. In layman’s terms, you’re looking at the page before it’s processed by the browser and content rendered via Javascript won’t show up.

    Using Inspect Element is the only way to really see what Wistia content is on the page. Doing that will show you much more information, including the fact that Wistia automatically adds and embeds video Schema when you embed a video. This is awesome and saves a ton of work over manually adding Schema like you have to do with YouTube videos.

    The video Schema contains critical fields like the video’s name and description. These are unique identifying factors that we can use to determine which video is placed where, but how can it be done at scale when we don’t even know which pages have videos and which don’t?

    Finding Wistia Schema With Screaming Frog

    Screaming Frog is one answer. Screaming Frog doesn’t crawl Javascript by default, but as of July 2016, DOES have the capability to do so if you configure it (you’ll need the paid version of the tool).

    Go into Configuration > Spider > Rendering and select Javascript instead of Old AJAX Crawling Scheme. You can also uncheck the box that says Enable Rendered Page Screenshots, as this will create a TON of image files and take unnecessarily long to complete.

    Setting Up a Custom Extraction

    Next you will need to setup a Custom Extraction which can be done by going to Configuration > Custom > Extraction. I’ve named mine Wistia Schema (not required) and set the extraction type to regex, then added the following regular expression:

    <script type="application\/ld\+json">\{"@context":"http:\/\/schema.org\/","\@id":"https:\/\/fast.wistia.net\/embed.*"\}<\/script>

    This will ensure you grab the entire block of Schema, which can be manipulated in Excel later to separate different fields into individual columns, etc.

    Then set Screaming Frog to list mode (Mode > List) and test the crawl with a page that you know has a Wistia video on it. By going into the Custom Extraction report, you should see your Schema appear in the Extraction column. If not, go back and make sure you’ve configured Screaming Frog correctly.

    Screaming Frog Memory and Crawl Limits

    The only flaw in this plan is that Screaming Frog needs a TON of memory to crawl pages with Javascript. Close any additional programs that you don’t need open so that you can reduce the overall memory your computer uses and dedicate more of it to Screaming Frog. With large sites, you may run out of memory and Screaming Frog may crash.

    Takeaways

    • Wistia uses Javascript liberally.
    • Schema is embedded automatically, using Javascript.
    • Schema can be crawled and extracted with Screaming Frog, but it’s a memory hog so larger sites might be a no-go.

    Questions? Tweet at me: @BerkleyBikes or comment here!

    Google My Business Posts

    By Local SEO, SEO 2 Comments

    A few weeks ago Google rolled out a post feature for its My Business Listings. Now you can create Facebook-like posts in the back end of the Google My business interface, that will display an image, description and website link in a box below your Google My Business listing’s knowledge graph. First I’ll show you how to create & optimize these, then I’ll discuss where I foresee them being most useful.

    Creating Google My Business Posts

    First log into your Google My Business platform and select the location you want to create a post for (if you have more than one). So far posts have to be manually created for each location, so it’s not easy to roll them out to hundreds of listings. The post you create will only show up for the listing you create it for.

    Once you’ve selected your location, click on the “Posts” option on the left nav and you’ll see a box in which you can write a post. You’ll also see previous posts located underneath (this particular post is expired, I’m not sure how long they stay there for).

    Once you click into the post editor, it’ll look like this. The interface is admittedly clunky.

    If you click on that big gray box, it’ll let you upload a photo and prompt you to crop it into a rectangular shape. (You would think the Photo Guidelines linked at the bottom would provide criteria for sizing, aspect ratio, etc. It does not.) Ideally your image should be engaging and grab attention. You may opt to include text in the image – this reminds me a lot of a Google AdWords Display ad, which may hint at the future of this functionality.

    Then you can add a description – you have between 100-300 words.

    There are really two types of posts – events and non-events. Non-event posts last a week, while event posts will prompt you to enter start/end dates and will stay up for the entire duration of the event.

    You can also add one of several preset call-to-action buttons for people to click on (I’ve chosen ‘Learn More’) and add a URL. I highly recommend tagging this URL, just like you should tag the landing page URLs in your GMB listings. Otherwise, it’ll come through as organic, but you may not know whether it was from a normal SERP or the post itself.

    You can use Google’s URL builder – be sure to tag the medium as organic (these URLs should only be accessible from an organic search). The source is up to you, but I’ve been using g-local-post as my source (to differentiate from g-local as my source in the listing URLs themselves).

    Then you can preview your post and if it looks good, publish it.

    Now you’ll see your post as a small box at the bottom of your branded knowledge graph. Despite the fact that I’ve done everything Google requested, the image is cut off and the description cut short. Hopefully this product evolves a bit and remedies some of those issues.

    You might think “I wonder if they look better on mobile?” – the answer is no (see below). If there’s more than one post, you do see a carousel (whereas desktop only displays one post at a time). On mobile, Google does allow you to click on a tab and see the posts by themselves, but who’s realistically going to do that?

    Takeaways

    The GMB Post format and interface is clunky. The images almost never show up as intended, making them ineffective. Their usefulness is also limited by where they appear. The only time these posts will show up is in a knowledge graph, which typically indicates a branded search took place.

    The chance they’d show up for a non-branded search is very limited, so they’re not much use to drive new organic traffic. If anything, they may steal traffic away from the GMB listings themselves, so be aware of that.

    While my examples used blog posts, this is probably poor usage. These types of posts would be much better suited to location-specific events that someone searching for a particular location would want to know about.

    It’s sort of like free display ads – I wouldn’t be surprised if Google eventually monetizes this with advertising, the way they added and monetized the local map pack with ads.

    Questions? Comments? Tweet at me (@BerkleyBikes) or drop a comment here!