Tool Tip Tuesday – Your Browser, Your Website and Search Engine Indexes – Part 3

What we saw and how we provide directions to correct it

search-engine-indexing-288x300-288x240So, we have identified a few opportunities using just our browsers and advanced queries on the Google and Bing indexes combined with some simple Webmaster Tool account reviews.  These are tools that should be available to every website (NOTE:  if you do not have Bing Webmaster Tools set-up, get it done already, you’re missing a free data resource for your site without it) and can be set-up on any platform relatively easily. What you can discover with your browser and the index queries themselves take no technical effort.

Rather than expound about what was written in Part 1 and Part 2 of this chain, I’ll simply let you read those if you missed them.  So, to move forward, we are going to look at the best way to provide guidance on fixing the various opportunities.

XML sitemap issues, especially when it comes to pre-existing functionality on common blog enabling platforms, can be a total pain.  This is due to the fact that so many things are handled through plug-ins and, let’s be honest, these plug-ins don’t always provide the level of flexibility we would want.  In this case, making sure that pages that are tagged as “noindex” do not show up in your sitemap files is not something that is easily done as the two functions are most likely controlled by different plug-ins without visibility to each other’s settings.  To get around this, you may need to use an independent sitemap generator and do some comparisons with a detailed crawl that identifies pages containing the “noindex” command.  I’m actually a huge fan of using a crawl and hand constructing my XML sitemap.  I personally use Excel to generate out my individual XML records per URL and then migrate this via copy/paste into a Notepad document in between opening and closing XML tags and then saving as UTM-8 format.  I find this gives me more individual control to customize my favorite XML markup signals which are <priority> and <changefreq>.  Now, this process does have the drawback that as URLs are added, you actually have to continuously update this, but if you build the right Excel template, this can be managed in a fairly streamline manner, especially with strategic use of the “Concatenate” function as seen below:

tooltiptues

Most crawlers are going to give you the URL structure in the two part set-up you see here.  I like to use VisualSEO with ScreamingFrog as a secondary for these types of crawls, which both break out the URLs as shown in the Excel sample above and notate any pages currently flagged as “noindex” either through the robots.txt file or via a robots meta tag for easy filtering.

As for the overly redundant blog categorization (and yes, I noticed the previous posts were tagged to the ‘blog’ category), we need to clean this up, and by clean it up, I mean simply remove it from existence and transfer any of the pages associated with this to the blog’s homepage.  This is something that should be done with a 301 redirect.  The more simplistic way to do this would be a single rule that looks for the URL pattern of ‘/blog/category/blog/*” and triggers a 301 redirect back to just ‘/blog/’.  This immediately removes this major area of potential duplication. Plus any value it has built up over time, will be consolidated to the true canonical source for all blog posts, the main blog homepage.  This can be done via most redirect rule manager plug-ins or done directly through the websites htaccess file.

The final step of all this cleanup is to audit the website, post these changes, to identify any lingering linking to the now 301’d category and remove or correct those links as is appropriate.  This provides a cleaner page footprint and internal linking structure for the site versus lining to actively 301 redirected URLs, which the search engines hate.  In particular, we have observed several instances of Google keeping a URL in their index despite a 301 redirect when they see the site still actively linking to the old URL.  While this may seem like “just another hoop to jump through”, I believe it is the right call.  If you have decided to actively 301 redirect a URL, the onus should be on the site to also clean up any and all link references to that old URL and repoint them to the new resolving URL.  Again, the crawlers mentioned above can also help identify this.

As an “old school” white hat SEO, I find that my adoption of tools is always behind that of many of my peers, probably because I came up through the ranks at a time when this was all evolving and if you wanted to discover things about your site, you had to build them or work with IT to develop them.  We have come a very long way from home-built log file analyzers and many tools that are available now are great, as long as they do not replace the overall analysis that goes into SEO.  SEO really comes done to several core factors: context (internal linking, page-to-phrase mapping, etc.), content (that is good for a user, but builds in targeted SEO phrases), indexability (not wasting a bots time and providing smart signals) and finally, simply building a site where SEO and usability work hand-in-hand.

My final “tip” is that anything that does not follow those core SEO principles that result in overly accelerated rankings growth is probably something that will turn out to be spam (i.e., buying links, keyword stuffing, etc.) and should be critically analyzed.  No one can promise a 1st spot ranking without being willing to “go to the dark-side” and often, an inexpensive SEO service will leverage these and other dirty tricks to save time, but when the site is penalized and/or worse, delisted entirely, the money you “saved” going with the cheaper SEO vendor will be nothing compared to the heartache of trying to get your business back on track.  So, build it right and realize the true long-term SEO benefit by focusing on the true foundation of your site and good SEO will come your way.