Below is an example of the methodical process of combing through a large scale website(s) and assessing the SEO quality by researching, creating reports, and implementing changes that can help the growth of a site.
OVERVIEW
The aim of a site optimization is to tackle both fundamental and advanced SEO issues on websites. Optimizations should occur at least 2-3 months before a site’s peak season. As SEO responsibility and knowledge is expanded through an organization, the roles and responsibilities have changed – along with fine tuning of processes and tools. This process incorporates knowledge, learnings and tasks from a combination of past projects aimed to improve organic ranking, content relevancy, performance, engagement and search engine indexation.
Goals
- Improve site indexation and crawl-ability – eradicate duplicate content issues.
- Ensure accessibility, speed and performance for top pages.
- Refresh content for search engines (and humans) – KWP re-targeting increase relevancy and freshness, address stale and shallow content, improving quality and in turn, engagement.
- Distribute responsibility of the technical SEO and content improvement processes.
Objectives
This project tied to the Marketing Team’s objectives to increase traffic to the core sites, to audit and refresh the top content and perform deep technical SEO. Actions in this project also impact the Development Team’s objectives to use cleaner code, follow front-end UI guidelines and standards, and improve performance.
Flow Chart
This chart shows a high level of the site optimization. See definitions list for an explanation of the steps and sections.
The schedule of deliverables and meetings will be available on (Google Calendar) and discussed in a kickoff meeting.
Assessments
Here’s a breakdown of the tasks associated with each assessment section, in detail, along with what resources are needed to complete them. Outputs of your assessment should be comprehensive and thoughtful – and most importantly – actionable. Consider that implementation requirements need to be written FROM your assessment – it should read as it would if you were filling a ticket for someone else to perform the implementation.
Important Documents for the Assessments:
- Site Checklists
- Calendar/Schedule
For Developers:
- Site’s Technical Assessment Report (in this report, you generally only need to fill in areas in “grey” highlights) — To find the link to the current assessment reports, visit the Appendix.
- Site’s Checklist
- Site’s Crawl Error Report
Marketing/Product Team:
- Site’s Advanced Site Content Report
- Other content assessment reports (on Links, etc.)
Accessibility and Technical
All technical assessments will be reported on and documented in the Technical Assessment Report. Find them all in the Appendix.
Sitemap Assessment
In this assessment, we will examine the health of the current XML sitemap(s) on site and we will report on our learnings in the Technical Assessment Report.
Tasks Include:
- Auditing the sitemaps we have submitted to GWT (Google Webmaster Tools) and BWT (Bing Webmaster Tools)
- Check our indexation (report URLs submitted and URLs indexed)
- Check which sitemap(s) are in robots.txt file
- Test the url list(s) for non-200 status urls
– ((on tab 2 of report))
– To do this your should create a txt file with just the urls (take out the priorities) and crawl that file with Integrity Link Checker in plain text mode. For more instructions, see the Appendix. - Crawl the site to identify pages missing from sitemap(s) – may need to crawl by one section at a time or exclude certain sections if dealing with multiple sitemaps.
Note: Site crawl would want to exclude images, but include PDFs, PPTs - Provide a guess as to why some files in crawl are not in sitemap, or suggest that they be added
Note: In the future, if we move away from using sitemap-gen, we will need to also validate our XML sitemaps.
Time Estimate: 1 hour
Resource: Front End
Output: Fill in information on the sitemap tabs of Technical Assessment Report.
LINK & IMAGE ATTRIBUTE ASSESSMENT
This assessment allows us to take a look at our image and href links so that we pinpoint which images need ALT attributes, which links needs TITLE attributes, which external links are not opening in a new window, and which links are not in the proper format. The last item will hopefully find links where the final ” / ” was forgotten.
Any images with semantic meaning should have ALT text. Scan the site to verify. Note that the -grep search excludes images such as bullets or icons that do not carry any semantic meaning.
Similarly, all hyperlink anchors should have a TITLE attribute, which functions similarly to ALT text, but for links. Do not be surprised if you find a lot of items. Our use of TITLE and ALT attributes in the past has been sporadic at best.
When scanning links, remember that this is only applicable to links that get sent to the end user. There is no need to examine the paths of server-side reference to PHP include files, etc.
Tasks Include: Using terminal, -grep the site to get an output of the following:
- Images without ALT attributes
grep -rns '<img ' . | grep -v '<img [^>]*alt=' | grep -v '\.svn'
- Link hrefs without TITLE attributes
grep -rns '<a ' . | grep -v '<a [^>]*title=' | grep -v '\.svn'
- External link hrefs without target=_blank
Run this command in the site’s local directory on your machine and replace “sitename” with the domain name without the .com – this command creates a list of links then filters to those where the href does not go to the site, then removes those with a target attribute:grep -rns '<a [^>]*href="http' . | grep -v '<a [^>]*sitename' | grep -v '<a [^>]*target=' | grep -v '\.svn'
- Relative link hrefs (as opposed to absolute)
grep -rns '<a [^>]*href="http' . | egrep -v '<a [^>]*href="(http|javascript|#|<\?|mailto)' | grep -v '\.svn'
and images…grep -rns '<img [^>]*src="' . | egrep -v '<img [^>]*src=(http|<\?)' | grep -v '\.svn'
- Hrefs not ending in /, .php, .png, .pdf, .gif, .ppt or .jpg (are not for analytics/hasoffers tracking or to specify mailto)
Use Integrity Link Checker to run a site crawl and review pages that did not render, or that had to redirect because it was missing a trailing slash, or included index.php etc.
Time Estimate: 1 hour
Resource: Front End
Output: Report the links and code with these issues in the 3rd tab of the Technical Assessment Report
Example: ./public/content/directory/index.php:55 <img src-“http://media.sitename.com/images/image-content.png” width=”800″>
PAGE SPEED ASSESSMENT
Since low page speed performance can negatively impact ranking power and user experience, we review the site to verify that certain performance-related practices have been followed.
Tasks Include:
1. Verify use of CDN for Static Assets
- Most static assets (images, PDFs, PowerPoint Presentations, Javascripts) should be served from the company CDN. This typically involves replacing references to “www.sitename.com” with “media.sitename.com“. Grepping the code is one way to find these; another is to surf the site and examine Firebug’s “Net” tab to see what domain assets are being loaded from. For example:
grep -rns '<img [^>]*src="' . | grep -v '<img [^>]*src="https://media\.' | grep -v '\.svn'
- Note that this will probably produce a *lot* of results. Only some of the matches will need to be served from the CDN. Please only include in your output the ones that will need to updated to serve from the CDN.
- Note that in some instances, certain key assets should *not* be served from the CDN. These include hidden tracking pixels, and static assets used by SEO as “linkbait”.
- Note: the about command only catches image tags but these issues are also occurring in the stylesheets. This command can catch those instances:
grep -rns 'url(' . | grep -v 'url(.//media' | grep -v 'url(.http://media' | grep -v 'url(https://media' | grep -v 'url(//media'
2. Search for the Report Unnecessary DNS Lookups
- Firebug’s “NET” tab can also help with identifying any assets that a page requests from other domains. Since DNS lookups can have some impact on page speed, we should review whether all the assets retrieved from other sites are strictly necessary.
3. Review Top 10 Pages
- Go into Google Analytics and identify the top 10 pages in terms of number of visits for the past year. Download the list and add to Report, along with their current page speed. Examine those pages looking for any page-specific optimizations which could be accomplished. Record any notes on optimizations which you think might be useful to those pages – whether in or out of scope of the site op.
Time Estimate: 2 hours
Resource: Front End
Output: List of assets not served on the CDN and unnecessary DNS look-ups in the 4th tab of the Technical Assessment Report. Both of these outputs should not be raw results, but those you recommend actually changing – or at least discussing in a review meeting.
HTACCESS ASSESSMENT
A site’s .htaccess file is read and scanned upon every single request to the server — that’s not just once per page, but perhaps dozens of time per page depending on how many images, javascripts, CSS files, and other requests a page spawns. Because every line in an .htaccess file could evaluate a very slow regular expression match, the size and composition of .htaccess file can have a significant impact on overall site performance.
In addition, search engines do not like when redirects go to missing pages (404 errors), pages that are broken (500 errors), or URLs that redirect more than once (301s and 302s). So keeping the .htaccess file clean and optimized is important.
Tasks Include:
- Verify Canonical Rules Exist
Do the following rules exist and work properly?
- Redirect domain.com to www.domain.com
- Redirect www.domain.com/directory/index.php to www.domain.com/directory/
- Redirect www.domain.com/directory -to- www.domain.com/directory/
Examples:
#Redirect domain.com www.domain
RewriteCond %{HTTP_HOST} ^sitename\.com [NC]
RewriteRule (.*) http://www.sitename.com/$1 [R=301, L]#Redirect urls with index.php to directoryRewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\/([^/]+/)*index\.php\ HTTP/RewriteRule ^(([^/]+/)*)index\.php$http%{ENV:askapache}://%{HTTP_HOST}/$1 [R=301, L]#Directory w/o slash add slashRewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/*(.+/)?([^.]*[^/])$ http://%{HTTP_HOST}/$1$2/ [L,R=301]
2. Check need for .php/ redirect
- Go into Google Analytics and run a report to see how many page views for the domain are for pages containing the string “.php/” for the past year. If it is a nontrivial number, add the rule to redirect .php/ to .php. Provide data on URLs you found getting traffic to .php/ and how many visits.
3. Search for “Redirects” in old format
- All Redirect directives will need to be replaced with RewriteRules directives. In general, this takes the form of changing something like:
Redirect 301 /directory-name/ http://www.domain.com/pagename.php
to something like:RewriteRule ^directory-name/$ http://www.domain.com/pagename.php [R=301,L]
Identify how many rewrites exist in the outdated “Redirect” format that need to be updated. The update itself will take place in the implementation phase.
4. Identify Outdated Rules & Check If Still Indexed
- Look for and flag rules which might have been added to the file before (year), or by people whose name you don’t recognize. By this time, the search engines will have replaced their entries with references to the pages that these redirects point to. Add the URLs of the flagged rules to the htaccess tab on the Assessment Report. Then search for them in Google’s index by performing a site: search (see example below)
5. Test Redirect Rules
- Test all the simple redirection rules and confirm that they point to pages which generate 200 OK responses, rather than 404s, 500s, or additional redirects. If redirect is such ^A to B, all A’s should be status 301 and all B’s should be status 200.
- The easiest way to do this is with a tool like the htmunger.php script, which uses curl to request the header of a redirected URL and then reports the HTTP response code of the first level of redirection. Given an .htaccess file, it will produce a list of redirects whose target results in a non-200 result. This is usually a good way to detect redirects that are dead (404s), broken (500s), or which redirect more than once (301s and 302s).
- Your output of this assessment should provide the SEO Team with enough data to identify redirect loops and broken redirects – and discuss them in a follow-up Review Meeting.
- Additionally, there are dynamic or regex redirects. Please test that they are worked as expected.
Time Estimate: Depends on size, age and general care of .htaccess file.
Resources: Front End
Output: Fill in responses to items on the 5th tab “htaccess tests” of the Technical Assessment Report.
Crawl Errors / Create Reports
In this section, you will run a number of crawlers against the site and make their output available for you to review. Some of the crawlers might include:
- SEO Moz
- Google Web Tool
- Bing Web Tool
- AW Stats
At first, the SEO will run those reports for you or with you and will instruct you on how to generate them for yourself. These reports should then be saved in a public space for others to view. Combining them into one report would be preferred.
Generate Statistics
Create summary statistics to give an overall report on health of domain. This typically involves simple counts of various error types, straight from the reports.
Classify Errors and Identify Patterns
Don’t try to dive into the analysis of every line in each error report. In general, there is a whole lot of chaff in these reports which can be safely ignored. However, there are a couple of ways to look at the reports that will help you clean up the site.
First, certain files or directories might produce a large number of errors. For example, if there is a problem with the “X” pages or “Y” pages, they will show up many times in the report. Try to classify the errors and then investigate their possible origination and report on what you found in the notes.
Secondly, look at URLs that produce the highest number of errors. For example, if one URL is generating hundreds of 404s, it is probably a good candidate for a redirect. Look into why these errors are occurring and report on in the notes. This narrowing down of the problems to the high value items helps the SEO and Business Teams make decisions about what needs to be fixed.
If you are hitting a brick wall and it is clear that a particular type of error needs much more investigatory time, alert the SEO & Business Teams and create a ticket with all the information you have found so far. File it in Next Sprint Candidates for Enterprise Marketing – Component SEO.
Another technique would be to look at what is common across reports. If a particular directory appears in one tool, its appearance in another tool would validate that observation and help raise the importance of that issue.
When the error involves images that don’t seem to exist, sometimes we have found them in the CSS files which are under sites/library/global/less -or- sites/library/global/css.
Time Estimate: 2-3 hours
Resources: Front End
CONTENT OPTIMIZATION ASSESSMENT (On Page)
- Pull Traffic Performance Data
- Identify Top and Low Performing pages on site
- Assess CTR in the SERPS
- TITLE and META Description Analysis
- Compare to potential pages report
- Assess relevance
- Assess readability
- Run InSpyder report — record # of spelling errors
- Assess site reading level and record for Top and Low Performing Pages
- “X NAME” pages assessment, if applicable
- “Y NAME” pages assessment, if applicable
Time: 3-4 hours
Resource: SEO (or Product)
Output: All this data, assessments and notes will be combined into the Advanced Site Content Report.
KW TARGETING ASSESSMENTS
- Outline and example site content architecture
- Identify top 3 entrances KW per page – assess whether KW match the content
- Identify top 30 KWs targeted on site, and their search volume
- Assess other KW opportunities by inventory
Time Estimate: 2 hours
Resource: SEO
Output: New link report. TBD
LINK AUTHORITY
- Assess links to linking domains ratio
- Assess anchor text breakdown
- Assess network linking
- Assess pages with over 100 on page links
- Assess pages with low amount of internal links
Time Estimate: 3 hours
Resource: SEO
Output: New link report. TBD.
REPORTING OUT FOR REVIEW MEETINGS
Documentation is a pivotal part of this project’s success and efficiency. As a developer, you will own the Technical Assessment Report for the site optimization. You will update each tab accordingly and fill in your time and completions in the Summary tab. All of work will end up in a shared Google Doc except the Crawl Errors analysis which will be completed in one single MS Excel doc. We try to limit the amount of documents associated with each site. As a Product Rep you will be more involved in the implementation than the assessment but will need to keep careful track of your progress. The SEO Team is responsible for most of the content related assessment documentation, which also lies primarily in one Excel spreadsheet.
When done with the assessments, we will have a Review Meeting with representatives of the Business and SEO teams to discuss the findings. The purpose of this meeting is to review your recommendations and decide on a set of actions to perform. Some of these actions will be code changes, some may involve further research by other teams.
At the meeting, you should be prepared to:
- Give an assessment about the site’s overall health with respect to the items you looked at.
- Talk about the analysis you did and what specific things you looked at.
- Examine the list of the actions suggested per assessment area, with the ability to dive into a discussion about how you came to those conclusions and why those changes would result in an improvement to the site.
- Implementation Tasks
The implementation tasks will be set in the Review Meeting. However, there are still some actions which we can predict and lay out here.
- Technical Implementation
- Sitemaps
From the sitemap assessment, there may be URLs to add/remove to the list. These will be decided in the Review Meeting. After a sitemap optimization, the sitemap needs to be re-submitted to Google and Bing – tested, verified and accepted without issues or warnings.
SEO TEAM ACTION: Review and update priorities in final, cleaned up urllist files. There is a step in the Sitemap implementation where the SEO team may need to step in and adjust priorities. This can only be done once the urllists are created and tested for errors and should occur before any new urllists are launched.
If and when it is determined that there are opportunities to add additional sitemaps to the site you are optimizing to aid in indexation, here are the steps to follow.
- Create new url list file (named: sitename.com_section_urllist.txt)
- Create associated config file (named: sitename.com_section_config.xml)
- Use a crawler, generate accurate url list of this directory (including .php, .pdf and .ppt)
- Ensure all links return server header of 200 (302 is OK – 301, 404 are not) and no links or directories on urllist are disallowed in the robots.txt file.
- Add default priorities to url list lines (directory = 0.5, files = 0.3)
- If applicable, remove links on new urllist from prior urllist they may have been in.
- Add new /section/sitemap.xml to list of sitemaps in robots.txt file.
- Submit new sitemaps to GWT and BWT — wait 4 days for them to verify and investigate any issues or errors that are found.
Time Estimate: 1.5 hours FE / 1 hour SEO
Resources: Front End and SEO
PAGE SPEED
This task list has been derived from UI Team’s Coding Standard and Code Cleanup checklist. The subset included here are primarily focused on those practices which have direct SEO impact or provide the biggest impact for the lowest cost.
Compress Images
Run the ImageOptim application against all images on the site and commit any which show savings in file size. Since discovery for this task basically includes optimizing the files, these might as well just be committed now, with results reported out to the team.
Reduce Unnecessary DNS Lookups
If assets retrieved from other sites are deemed to be unnecessary in a Review Meeting, you may go ahead and change or move them in this stage.
Optimize Top 10 Pages
Perform any performance related optimizations for the first 10 pages as agreed upon in Review Meeting based on Front End Dev suggestions from the assessment.
Time Estimate: TBD
Resource(s): Front End
LINK & IMAGES ATTRIBUTES
Add ALT and TITLE attributes and target=_blank
Add ALT text to images, TITLE attributes to links and target=_blank to external links (yes, includes our network of sites). If it is not clear what to use for those attributes use alt=”” or title=”” and ping the group that the rest need to be updated.
Fix Relative Links
Change all identified relative links to absolute for hrefs and images.
Time Estimate: 1 hour
Resource(s): Front End, (Product)
HTACCESS CLEANUP
- Replace Redirect rules with RewriteRule’s
- Add canonical rules, if necessary
- Add rule to force .php, if necessary
RewriteRule ^(.*\.php)/$ http://www.SITENAME.com/$1 [QSA, R=301, L]
- Remove outdated rules and fix broken rules, as per decided in Review Meeting
Time Estimate: 5-7 hours
Resource(s): Product & SEO
Content Optimization Implementation
Suggestions from the content assessment will be reviewed in a separate meeting and implementation requirements set. This will generally include: fixing spelling errors, updating title tags and description, increasing KW relevancy on selected pages, reworking content on selected pages, adding or removing internal and external links, and more.
Time Estimate: 5-7 hours
Resource(s): Product and SEO
TOOLS
Tools that will be helpful to accomplish these tasks.
- Batch HTTP Server Header Checker: http://www.seoconsultants.com/tools/headers-batch
- Google Webmaster Tools (GWT): www.google.com/webmaster/tools/
- Bing Webmaster Tools (BWT): www.bing.com/toolbox/webmaster
- Readability Assessment Tool: http://www.read-able.com/
- Load Time: http://tools.pingdom.com/fpt/ (or can use average page load time in GA)
- Integrity Link Checker: http://peacockmedia.co.uk/integrity/ (download)
DEFINITIONS (Glossary)
Define terms associated with parts of the assessment and implementation. You will find these terms highlighted throughout the document.
Site Op Calendar: This is a shared Google Calendar which will contain deadlines, deliverables and meeting dates for the project. It will also be on the MKT Team White Board. To add to your Google Calendar, you can add the following address into the Other Calendars [LINK]
Kickoff Meeting: Meeting at the beginning of the project where the 3 teams will review the task plan, estimates, tracking systems, deliverables, and due dates.
Assessments Phase: First phase of the project where technical and content related issues (and opportunities) are examined and reported upon. This phase does NOT include changing any code. The content and technical assessment may or may not happen simultaneously, depending on resources.
Content Optimization: An assessment and implementation phase in the project where on page SEO elements readability, and relevancy are examined and then improved.
Technical Assessments: These assessments focus specifically on sitemaps, link and images attributes, page speed, htaccess health and crawl errors. Coding is NOT part of the assessment – only investigation and reporting.
Link Authority: An assessment and implementation phase where we look at the type of internal, external and interlinking we do on this site and within our network. Then we make recommendations on where we would like to be and implement changes.
KW Targeting: An assessment and implementation phase where we analyze the site architecture, identify top KWs and assess KW opportunities.
Assessment Report: Each part of the assessments will be reported somewhere. Please see the specific sections to see where they report will lie.
Strategy Phase: In the strategy phase, we compare the assessments to our ideal metrics or goals and determine what can be done to get from A (present) to B (ideal).
Review Meetings: In the review meetings, the Technical Assessments are reviewed and implementation requirements are set (Rx Phase). This will only work as expected if the assessments have been thorough and the output is easy to digest. There will also be a Content Assessment review meeting where next steps will be discussed.
Rx Phase: This phase occurs IN the Review Meeting and sets implementation requirements and plans.
Implementation: In this phase, the strategy for optimizing the site based on the assessments is put forth into action and content and code is worked on.
Content Overhaul: Content issues that expand way beyond the scope of the optimization will be pushed in a Content Overhaul bucket for future planning.
Other Initiatives or Tickets: Other issues that require more discovery time than we have or that were uncovered accidentally during the optimization can be set aside as part of a separate initiative or ticket. These much be discussed at the Review Meeting.
Technical Assessment Report: Each site that undergoes an optimization project should have a Report doc. If one does not exist already, please go to the template doc, duplicate it and rename accordingly (ex: ProjectName Site Op Technical Assessment Report). In this report, you only need to fill in areas that are boxed with a grey highlight.
Advanced Site Content Report: This is an advanced report that combines data from at least 5 different sources and allows the SEO Team to assess health of a site and strategy for the rest of the content optimization.
Appendix
Current and previous technical assessment worksheets are all in Google Docs and shared with everyone within the COMPANY.
Setting up Integrity Link Checker to scan a txt file: drag file into bar as shown, “Plain Text Mode” should be set by default and set “Ignore Trailing Slashes”.
CHECKLISTS
This as a detailed checklist to track progress on tasks for the various parts of the project. Print these out and keep them available. As we will use these during all stand-up and status meetings.
Technical Assessment Checklist
TASK | RESOURCE | TIME (min) | COMPLETED | REPORT |
Kickoff Meeting | Dev | Meeting Notes | ||
Audit Sitemaps | Dev | TAR Sitemap TEST | ||
Check Indexation | Dev | TAR Sitemap TEST | ||
Robots.txt | Dev | TAR Sitemap TEST | ||
TEST urllist(s) | Dev | TAR Sitemap TEST | ||
Site Crawl – Compare | Dev | TAR Sitemap CRAWL | ||
Suggest Add’l URLs | Dev | TAR Sitemap CRAWL | ||
Image alt attr. | Dev | TAR Link & Img | ||
Link title attr. | Dev | TAR Link & Img | ||
External Links | Dev | TAR Link & Img | ||
Relative links/img | Dev | TAR Link & Img | ||
Verify Use of CDN | Dev | TAR Page Speed | ||
DNS Lookups | Dev | TAR Page Speed | ||
Top 10 Optim. | Dev | TAR Page Speed | ||
Verify Canonical | Dev | TAR htaccess tests | ||
Need for .php/ R | Dev | TAR htaccess tests | ||
Old ‘Redirects’ | Dev | TAR htaccess tests | ||
Outdated Rules | Dev | TAR htaccess tests | ||
Test Redirects | Dev | TAR htaccess tests | ||
Generate Error Reports | Dev | Crawl Error Report | ||
Classify & Investigate Errors | Dev | Crawl Error Report | ||
Documentation (all) | Dev | All Reports for TA |
Content Assessment Checklist
TASK | RESOURCE | TIME (min) | COMPLETED | REPORT |
Traffic & Performance Metrics | SEO | Content Report | ||
Top & Low Performing Pages | SEO | Content Report | ||
Assess CTR in SERPS | SEO | Content Report | ||
Title & Meta Analysis | SEO | Content Report | ||
Assess Relevance | SEO | KW Report | ||
X Name Pages | SEO | KW Report | ||
X Name Pages | SEP | Product Report | ||
Assess Readability | Product | Product Report | ||
Assess Reading Level | Product | Product Report | ||
Run Spell Check | Product | Content Report | ||
Site Content Architecture | SEO | Content Report | ||
Top 3 Entrance KWs | SEO | KW Report | ||
Top 30 KW on Site | SEO | KW Report | ||
Assess KW Inventory | SEO | Link Report | ||
Links to Linking Domains | SEO | Link Report | ||
Anchor Text Breakdown | SEO | Link Report | ||
Network Linking | SEO | Link Report | ||
Link Limiting | SEO | Link Report | ||
Link Juice Distribution | SEO | Link Report | ||
Documentation (all) | SEO | All Reports |
Implementation Tasks
An itemized in-depth report would be written up in more detail here based on data taken from the research component of this project.
RELATED POST:
SEO Checklist