How to Do a Technical SEO Audit: A Step-by-Step Guide
Table of Contents
Table of Contents
What a Technical SEO Audit Actually Is
A technical SEO audit is not a keyword review. It is not a content strategy. It is the systematic examination of a website’s technical infrastructure to identify every factor that is preventing search engines from finding, reading, and ranking the site correctly.
A site can have excellent content, strong backlinks, and a well-structured keyword strategy — and still rank poorly because a misconfigured robots.txt file is blocking Googlebot from crawling the most important pages, JavaScript is rendering content that Google cannot see on the first crawl pass, or because fifteen years of redirects have accumulated into chains that bleed authority at every hop.
Technical SEO is the foundation. Content and authority are the structure built on top of it. A weak foundation limits the ceiling of everything else, regardless of how much effort goes into the upper layers.
This guide is written for practitioners who want to run a thorough, methodical audit. It covers nine audit areas in sequence, each with the specific questions to ask, the tools to use, the code to run, and the criteria for a pass or fail judgment.
The master checklist at the end consolidates every item into a single reference document.
Step 1: Crawlability and Access
The first question in any audit is the most fundamental: can Google get in? A site that blocks crawlers entirely, or blocks them from significant sections, is invisible to search regardless of how excellent the content is. This step checks everything that determines whether Googlebot can access the site.

1.1 robots.txt
The robots.txt file lives at yoursite.com/robots.txt. It is the first thing Googlebot reads when it visits a site. It can allow or disallow access to specific paths. A single misconfigured line can silently block crawling of the entire site.
Check the robots.txt manually by visiting the URL. Then validate it in Google Search Console under Settings > robots.txt.
Common errors: blocking /wp-content/ (prevents image and asset crawling), blocking /api/ on headless sites (prevents content delivery path crawling), and leaving staging-era Disallow: / in place after a migration.
Audit action: Fetch yoursite.com/robots.txt. Check every Disallow line. Verify that no important content paths, image directories, or API routes are blocked. Test specific URLs against the robots.txt using GSC’s robots.txt tester.
1.2 XML Sitemap
The XML sitemap tells Google which pages exist and should be indexed. A poorly maintained sitemap is one of the most common and most impactful technical issues on established sites.

# Verify sitemap is accessible
curl -I https://yoursite.com/sitemap.xml
# Expected response: HTTP/2 200
# Red flags: 404 (sitemap missing), 301 (redirect chain), 403 (blocked)
# Check sitemap index (common on large sites)
curl https://yoursite.com/sitemap_index.xml
# Validate sitemap structure
# A valid XML sitemap contains <urlset>, <url>, <loc>, and optional# <lastmod>, <changefreq>, <priority> tags
# Screaming Frog: Mode > Spider > Crawl > Sitemaps
# Lists all URLs in sitemap and flags issues:
# - URLs in sitemap that return 4xx errors
# - URLs in sitemap that are noindexed
# - Sitemap URLs not found in crawl
# - Non-canonical URLs in sitemapThe most common sitemap errors includes: noindexed pages, redirected URLs instead of their final destinations, paginated archive pages that should be canonicalised to the root, and staging URLs that were accidentally pushed to production.
Audit action: Submit the sitemap in GSC and review the Coverage report. A sitemap should contain only canonical, indexable, 200-status URLs. Every page in the sitemap should be a page you want indexed. Every page you want indexed should be in the sitemap.
1.3 Crawl budget
Crawl budget is the number of pages Googlebot crawls on your site within a given timeframe. On small sites (under 1,000 pages), crawl budget is rarely a constraint. On large sites, it can mean that important new content is not being indexed promptly because Googlebot is spending budget crawling low-value pages.
# Google Search Console: Settings > Crawl Stats
# Review: Total crawl requests, response breakdown, file type breakdown
# High crawl budget waste indicators:
# - Large number of 404 crawl requests (fix: 301 redirect or remove internal links)
# - Large number of redirect crawl requests (fix: update internal links to final URL)
# - Many /page/2/, /page/3/ archive pagination pages being crawled
# - Faceted navigation URLs with parameters being crawled
# Check for parameter-based URL proliferation:
curl 'https://yoursite.com/products/?sort=price&color=blue&size=M'
# If this returns 200 and is indexable, multiply by all filter combinations
# = potentially thousands of indexable near-duplicate URLs
# Fix: noindex pagination and faceted nav, or use parameter handling in GSC1.4 Site access check
Before every audit, run a basic server response check on the most important pages. This confirms the site is returning the correct HTTP status codes and that there are no authentication layers blocking crawlers.
# Check HTTP response codes for key pages
curl -I -L https://yoursite.com/
curl -I -L https://yoursite.com/services/
curl -I -L https://yoursite.com/blog/
# The -L flag follows redirects. Check the final status code.
# Expected: HTTP/2 200
# Red flag: HTTP/1.1 401 (authentication), 403 (forbidden), 503 (server error)
# Simulate Googlebot to check for cloaking:
curl -A 'Googlebot/2.1 (+http://www.google.com/bot.html)' https://yoursite.com/
# Compare HTML returned to what a regular browser sees
# If different: cloaking — a severe manual penalty riskStep 2: Indexation
Crawlability determines whether Google can access the site. Indexation determines whether Google chooses to include the pages it finds in the search index. These are separate decisions, and both can fail independently.
2.1 Coverage report analysis
Google Search Console’s Pages report (formerly Coverage) is the primary indexation diagnostic. It classifies every URL Google has encountered into four categories.

# Google Search Console: Indexing > Pages
# Four categories to review:
# 1. Error (red): URLs with a blocking issue
# - Server error (5xx): hosting or server problem
# - Redirect error: broken redirect chain
# - Not found (404): page deleted with no redirect
# 2. Valid with warning (amber):
# - Indexed, though blocked by robots.txt (crawl/index conflict)
# - These pages ARE indexed despite robots.txt blocking
# 3. Valid (green):
# - Indexed and in search results
# 4. Not indexed (grey — the diagnostic goldmine):
# - Crawled, currently not indexed: Google visited but chose not to index
# - Discovered, currently not indexed: Google found but hasn't visited yet
# - Excluded by 'noindex' tag: intentional exclusion
# - Duplicate without user-selected canonical: URL duplication problem
# - Alternate page with proper canonical tag: correct canonicalisation
# Key audit action: click 'Crawled, currently not indexed'
# These pages passed crawl but failed Google's quality threshold
# Common causes: thin content, duplicate content, slow load, unclear purpose2.2 noindex audit
The noindex meta tag is the most powerful indexation control on a page. It should be present on pages you explicitly want excluded from the index. It should never appear on pages you want to rank.
<!-- Correct: noindex on a utility page --><meta name="robots" content="noindex, follow">
<!-- Also accepted via X-Robots-Tag HTTP header (for non-HTML files) -->
# Server response header:X-Robots-Tag: noindex
# Screaming Frog: Configuration > Spider > Extraction
# Add custom extraction for: //meta[@name='robots']/@content
# This lists every page's robots meta tag in a column
# Sort to find unexpected noindex on important pages
# Common mistakes:
# - WordPress: plugin accidentally noindexes entire site during development
# - Paginated pages: /blog/page/2/ through /blog/page/50/ noindexed correctly
# but so are /services/seo/ and /about/ by accident
# - Staging URL regex accidentally includes production patterns2.3 Canonical tags
Canonical tags tell Google which URL is the authoritative version of a page when multiple URLs serve the same or similar content. A self-referencing canonical on every page is defensive SEO best practice. Missing or incorrect canonicals are a primary source of duplicate content dilution.
<!-- Self-referencing canonical: correct implementation -->
<link rel="canonical" href="https://yoursite.com/services/seo-lagos/" />
<!-- Common canonical errors: -->
<!-- 1. Canonical pointing to a 404 -->
<link rel="canonical" href="https://yoursite.com/old-services/" />
<!-- Fix: update to point to the correct live URL -->
<!-- 2. Canonical pointing to a redirect -->
<link rel="canonical" href="https://yoursite.com/services/seo/" />
<!-- If /seo/ redirects to /seo-lagos/, canonical should point to /seo-lagos/ -->
<!-- 3. HTTP canonical on an HTTPS page -->
<link rel="canonical" href="http://yoursite.com/services/" />
<!-- Fix: canonical must exactly match the preferred HTTPS URL -->
<!-- 4. Paginated pages all canonicalising to page 1 -->
<!-- /blog/page/3/ canonical = /blog/ -->
<!-- This collapses all paginated content into one page for Google -->
<!-- Better: self-referencing canonicals + rel=prev/next (optional) -->
# Screaming Frog: Directives > Canonical
# Shows canonical URL
# Shows canonical URL for every page and flags mismatchesStep 3: Site Architecture and URL Structure

Site architecture is the set of decisions that determines how pages relate to each other and how authority flows through the site. Poor architecture is invisible in content audits and invisible in basic technical checks, but its effect on ranking performance is cumulative and compounding.
3.1 Click depth
Click depth is the number of clicks required to reach any page from the homepage. Pages more than three to four clicks deep receive lower crawl priority and accumulate authority slowly. A well-structured site keeps all commercially important pages within three clicks.
# Screaming Frog: Crawl Data > URL > Sort by 'Crawl Depth'
# Target: key service/product pages within depth 2-3
# Red flag: important pages at depth 5+
# Python: check crawl depth distribution from Screaming Frog export
import pandas as pd
df = pd.read_csv('screaming_frog_export.csv')
depth_counts = df['Crawl Depth'].value_counts().sort_index()
# Flag pages you care about that are too deep:
important_pages = df[df['Address'].str.contains('/services/|/products/')]
deep_important = important_pages[important_pages['Crawl Depth'] > 3]
print(deep_important[['Address', 'Crawl Depth', 'Inlinks']].to_string())3.2 URL structure audit
URLs should be descriptive, consistent, and clean. The technical audit checks for URL patterns that fragment authority or create indexation problems.
# URL anti-patterns to identify in Screaming Frog export:
# 1. Session IDs in URLs (create duplicate content at scale)
# https://yoursite.com/product/?session=abc123xyz
# 2. Parameter proliferation (faceted navigation)
# https://yoursite.com/shoes/?colour=red&size=42&sort=price-asc
# 3. Inconsistent trailing slash usage
# Both /services/ and /services returning 200 = duplicate content
# 4. Uppercase characters in URLs
# /Services/SEO/ and /services/seo/ may index as different URLs
# 5. Dates in content URLs (implies content ageing)
# /blog/2021/03/15/seo-guide/ vs /blog/seo-guide/
# Check for trailing slash consistency:
curl -I https://yoursite.com/services
curl -I https://yoursite.com/services/
# One should return 200, the other should 301 to the preferred version
# Check for www/non-www canonicalisation:
curl -I http://www.yoursite.com/
curl -I http://yoursite.com/
# Both should ultimately 301 to https://yoursite.com/ (or www version)3.3 Redirect audit
Redirects accumulate over time. Every site that has undergone a migration, a URL restructure, or a CMS change carries redirect debt. The audit identifies chains, loops, and broken redirects.
# Screaming Frog: Response Codes > Redirection 3xx
# Then: Reports > Redirects > Redirect Chains
# Redirect chain example (each hop loses ~5-10% link equity):
# /old-page/ -> 301 -> /interim-page/ -> 301 -> /new-page/
# Fix: update source to point directly to /new-page/
# Check redirect chain length from command line:
curl -sIL -w '%{url_effective}\n' -o /dev/null https://yoursite.com/old-page/
# Full redirect chain trace:
curl -v --max-redirs 10 https://yoursite.com/old-page/ 2>&1 | grep -E '(< HTTP|Location:)'
# Expected output for a clean redirect:
# < HTTP/1.1 301 Moved Permanently
# Location: https://yoursite.com/new-page/
# < HTTP/2 200
# Red flag output (chain):
# < HTTP/1.1 301
# Location: /interim/
# < HTTP/1.1 302
# Location: /new-page/
# < HTTP/2 200
# Also: 302 (temporary) used where 301 (permanent) is correct
3.4 Internal link audit
Internal links are the mechanism through which authority flows between pages. This section checks for orphaned pages (no inbound internal links), broken internal links (pointing to 404 or redirected URLs), and anchor text patterns.
# Screaming Frog: Internal > Filter by 'Inlinks' column
# Sort ascending: pages with 0 inbound internal links = orphaned
# Export internal links for anchor text analysis:
# Reports > Inlinks > export CSV
# Check broken internal links:
# Response Codes > Client Error (4xx)
# Then: check 'Inlinks' tab on any 4xx URL to find the source pages
# Python: find orphaned pages by type
df = pd.read_csv('screaming_frog_export.csv')
# Filter to important page types
service_pages = df[
df['Address'].str.contains('/services/|/products/') &
(df['Status Code'] == 200)
]
# Flag those with no or very few inlinks
underlinked = service_pages[service_pages['Inlinks'] < 2]
print(f'Underlinked service pages: {len(underlinked)}')
print(underlinked[['Address', 'Inlinks', 'Title 1']].to_string())
Step 4: Page Speed and Core Web Vitals
Core Web Vitals are a confirmed Google ranking signal. Since 2021, LCP, CLS, and INP have directly influenced where pages rank. The audit measures field data (real user experience) separately from lab data (simulated performance), and treats them differently.
4.1 Field data vs lab data
PageSpeed Insights runs both a Lighthouse lab test and pulls CrUX (Chrome User Experience Report) field data. The field data is what Google uses for rankings. The lab data is useful for diagnosis but does not directly affect rankings.
# Check CrUX field data programmatically via the CrUX API
# Requires: Google API key (free)
curl -X POST \
'https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data '{
"origin": "https://yoursite.com",
"formFactor": "PHONE",
"metrics": ["largest_contentful_paint", "cumulative_layout_shift", "interaction_to_next_paint"]
}'
# Response includes p75 (75th percentile) values — what Google measures
# Good thresholds:
# LCP : <= 2500ms
# CLS : <= 0.1
# INP : <= 200ms
# If field data shows Poor: this is an active ranking penalty
# If field data shows Needs Improvement: partial penalty
# If Lighthouse lab passes but field data fails: look at real user conditions
# (slow devices, slower connections) not just your fast laptop
4.2 LCP diagnosis
Largest Contentful Paint measures the loading of the largest visible element. Diagnosing a poor LCP requires identifying which element is the LCP element and which resource on the critical path is delaying it.
// DevTools: Performance tab > Record page load
// Look for the 'LCP' marker in the timeline
// The element identified is what you need to optimise
// JavaScript: measure LCP in browser console
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
console.log('LCP element:', entry.element);
console.log('LCP time (ms):', entry.startTime);
}
}).observe({ type: 'largest-contentful-paint', buffered: true });
// Common LCP causes and fixes:
// 1. Hero image without fetchpriority
// Fix: <img src="hero.jpg" fetchpriority="high" loading="eager">
// 2. Render-blocking CSS or fonts
// Fix: <link rel='preload'> critical CSS; font-display: swap
// 3. Slow server TTFB delaying everything
// Fix: CDN, server-side caching, hosting upgrade
// Check TTFB specifically:
curl -w '@curl-format.txt' -o /dev/null -s https://yoursite.com/
# curl-format.txt: time_starttransfer: %{time_starttransfer}\n
# Target TTFB: under 800ms; ideally under 200ms from CDN edge
4.3 CLS diagnosis
Cumulative Layout Shift measures visual stability. Identifying CLS sources requires watching the page load on a slow connection with the Layout Shift Regions tool in DevTools.
// DevTools: Rendering > Layout Shift Regions (check box)
// Blue flash = layout shift occurring at that element
// JavaScript: measure and log CLS in console
let clsValue = 0;
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (!entry.hadRecentInput) {
clsValue += entry.value;
console.log('CLS entry:', entry.value, 'Sources:', entry.sources);
}
}
console.log('Cumulative CLS:', clsValue);
}).observe({ type: 'layout-shift', buffered: true });
// Most common CLS causes:
// 1. Images without width and height attributes
// Fix: <img src="photo.jpg" width="800" height="600" alt="...">
// 2. Web fonts causing text reflow on swap
// Fix: font-display: optional (no swap) or size-adjust on fallback
// 3. Dynamically injected banners or ads above content
// Fix: reserve exact height before injection with min-height CSS
// 4. Late-loading iframes (cookie banners, chat widgets)
// Fix: reserve space or load below viewport
4.4 INP diagnosis
Interaction to Next Paint replaced First Input Delay in March 2024. It measures the worst interaction responsiveness across the full page session. It is harder to pass than FID and requires more investigation.
// DevTools: Performance > record user interactions
// Look for long tasks (red bars) in the main thread
// These block interaction processing
// Identify long tasks programmatically:
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
console.log('Long task:', entry.duration + 'ms at', entry.startTime);
}
}).observe({ type: 'longtask', buffered: true });
// Target: no single long task over 50ms on interaction path
// Common INP causes:
// 1. Main thread animations (width/height vs transform/opacity)
// Bad: .element { transition: width 300ms; }
// Good: .element { transition: transform 300ms; }
// 2. Heavy third-party scripts (chat widgets, analytics)
// Fix: load third parties with defer or after user interaction
// 3. Synchronous event handlers on click/tap
// Fix: use scheduler.postTask() or requestAnimationFrame to defer
// non-critical work out of the interactionStep 5: Mobile and Usability
Google uses mobile-first indexing for all sites. The mobile version of the site is what Google primarily uses for indexing and ranking. A site that delivers a good desktop experience but a poor mobile experience is ranked on the poor experience.
5.1 Mobile usability check
# Google Search Console: Experience > Mobile Usability
# Common errors flagged:
# - Clickable elements too close together (tap targets under 48px)
# - Content wider than screen (horizontal scroll required)
# - Text too small to read (under 12px effectively)
# - Viewport not configured
# Check viewport meta tag in page source:
curl -s https://yoursite.com/ | grep -i viewport
# Expected: <meta name='viewport' content='width=device-width, initial-scale=1'>
# Missing viewport = mobile usability fail
# DevTools: Toggle Device Toolbar (Cmd+Shift+M / Ctrl+Shift+M)
# Test on Moto G4 (mid-range Android) not just iPhone Pro
# Nigerian mobile market is predominantly mid-range Android
# What looks fine on a high-end device may fail on budget hardware
# Check tap target sizes:
# DevTools > Lighthouse > Accessibility > Tap targets
# All interactive elements should be at least 44x44px
# Buttons with padding: { padding: 12px; min-height: 44px; min-width: 44px; }
5.2 Mobile vs desktop parity
With mobile-first indexing, any content that is present on desktop but absent on mobile will not be indexed. Check that critical content, navigation, and structured data are present on the mobile render.
# Compare mobile and desktop source:
# Screaming Frog: Configuration > User Agent > Googlebot Smartphone
# Run a second crawl in smartphone mode
# Export both crawls and compare word counts on key pages
# Significant word count difference (mobile < desktop) = content hidden on mobile
# GSC URL Inspection: test any URL
# 'View tested page' > 'Screenshot' tab: shows mobile render
# 'View tested page' > 'More info' > 'HTML' tab: shows rendered HTML
# Check that structured data, headings, and key content appear in HTML
# Check that structured data is not conditionally rendered:
# Verify JSON-LD block is in <head> or <body> on both renders
curl -A 'Googlebot-Mobile' https://yoursite.com/ | grep 'ld+json'
# Should return the full structured data blockStep 6: On-Page Technical Signals
On-page technical signals are the metadata and structural elements that tell search engines what each page is about. This step audits every page-level technical signal across the site for completeness, uniqueness, and correctness.
6.1 Title tags
# Screaming Frog export: Page Titles column
# Check for:
# 1. Missing title tags
# Screaming Frog: Page Titles > Filter 'Missing'
# 2. Duplicate title tags (dilutes topical specificity signal)
# Screaming Frog: Page Titles > Filter 'Duplicate'
# 3. Title tags too long (truncated in SERP at ~580px / ~60 chars)
# Screaming Frog: Page Titles > Filter 'Over 60 Characters'
# 4. Title tags too short (under 30 chars = opportunity wasted)
# Screaming Frog: Page Titles > Filter 'Below 30 Characters'
# 5. Title tags not containing primary keyword for the page
# Manual review: export titles with URLs, check against keyword strategy
# Ideal title tag formula for a service page:
# [Primary Keyword] | [Secondary Keyword] | [Brand] — [Location if local]
# 'Technical SEO Audit Services | SEO Agency Lagos | Semola Digital'
# Length: 55-60 characters6.2 Meta descriptions
# Meta descriptions are not a ranking factor but affect CTR significantly
# Screaming Frog: Meta Description column
# Check for: missing, duplicate, too long (>155 chars), too short (<70 chars)
# Write meta descriptions that:
# - Summarise the page in the user's language
# - Include the primary keyword naturally
# - Contain a benefit or call to action
# - Are unique per page
# Screaming Frog: bulk export meta descriptions to spreadsheet
# Sort by length to quickly identify missing and oversized descriptions
# Flag pages with impressions > 1000 in GSC but CTR < 2%
# These are candidates for meta description rewriting
6.3 Header tag structure
# Screaming Frog: H1 column
# Check for:
# 1. Missing H1 (no primary topical signal on the page)
# 2. Multiple H1 tags (ambiguous topical signal)
# 3. H1 does not match title tag intent (missed keyword opportunity)
# 4. H2/H3 hierarchy skipped (H1 then H4 directly)
# Check H1 programmatically:
curl -s https://yoursite.com/services/ | grep -i '<h1'
# Expected: one <h1> tag containing the primary keyword
# Example: <h1>Technical SEO Services in Lagos</h1>
# Screaming Frog: Crawl Analysis > H1 Tags report
# Export to CSV, filter for pages with 0 H1s or 2+ H1s
# Heading hierarchy rule:
# H1: primary topic of the page (one only)
# H2: major sections within the page
# H3: sub-sections within H2 sections
# Never skip levels (H1 > H3 without H2 in between)6.4 Image optimisation
# Screaming Frog: Images tab
# Check for:
# 1. Missing alt text
# Images > Filter 'Missing Alt Text'
# All images that convey content need descriptive alt text
# Decorative images: alt='' (empty, not missing)
# 2. Images too large (slow LCP, unnecessary bandwidth)
# Images > Filter 'Over X kB' — set threshold at 150KB
# Hero/LCP images: target <150KB at 1200px wide in WebP format
# 3. Images missing width/height attributes (causes CLS)
# Check in Screaming Frog Image report or in page source
# 4. Images not in modern format (JPEG/PNG instead of WebP/AVIF)
# Screaming Frog: Images > Content Type column
# Quick check: count images missing alt text on a key page
curl -s https://yoursite.com/ | grep -o '<img[^>]*>' | grep -v 'alt=' | wc -l
# Output: number of images with no alt attribute at all
Step 7: Structured Data
Structured data audit covers two questions: is the correct schema present on every page that should have it, and is the schema that is present valid and error-free?
7.1 Schema coverage audit
# Screaming Frog: Configuration > Custom Extraction
# Add: XPath // script[@type='application/ld+json']
# This extracts the full JSON-LD content from every page
# Export and review: which pages have schema? Which should but don't?
# Minimum schema requirements by page type:
# Homepage: Organization or LocalBusiness + WebSite
# Service pages: Service or ProfessionalService
# Blog posts: Article or BlogPosting
# All pages: BreadcrumbList
# FAQ content: FAQPage
# How-to content: HowTo
# Check for schema on a specific page:
curl -s https://yoursite.com/services/seo/ | python3 -c "
import sys, json, re
html = sys.stdin.read()
blocks = re.findall(r'<script[^>]*type=[\"\']application/ld\+json[\"\'][^>]*>(.*?)</script>', html, re.DOTALL)
for i, block in enumerate(blocks):
print(f'Schema block {i+1}:', json.loads(block.strip()).get('@type', 'unknown'))
"
# Output lists each schema block and its @type7.2 Schema validation
# Validate structured data against Google's requirements:
# Tool: search.google.com/test/rich-results
# Enter URL or paste JSON-LD directly
# Command line validation (schema.org validator):
# Install: npm install -g schema-inspector
# Check for common validation errors:
# 1. Missing required properties
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "SEO Guide"
// Missing: author, datePublished, image, publisher
// Result: valid JSON but NOT rich-result eligible
}
# 2. datePublished in wrong format (must be ISO 8601)
# Wrong: "datePublished": "14th July 2025"
# Correct: "datePublished": "2025-07-14"
# 3. Nested entity missing @type
"author": {
"name": "Oladoyin Falana" // Missing @type: Person
}
# 4. FAQPage answers not matching visible page content
# Google rejects FAQPage if acceptedAnswer text is not on the page
# Validate: search for the answer text in the page source
Step 8: HTTPS, Security, and Server Configuration
HTTPS has been a ranking signal since 2014. The audit checks for HTTPS implementation completeness, SSL certificate validity, and mixed content issues that can undermine a technically correct HTTPS migration.
8.1 HTTPS implementation
# Check all four root variants resolve to one canonical HTTPS URL:
curl -sI http://yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI http://www.yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI https://www.yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI https://yoursite.com/ | grep -E '(HTTP|Location)'
# Expected: all non-preferred variants return 301 to the canonical
# e.g.: HTTP/1.1 301 -> Location: https://yoursite.com/
# Check SSL certificate validity and expiry:
echo | openssl s_client -connect yoursite.com:443 2>/dev/null | openssl x509 -noout -dates
# Output: notBefore and notAfter (expiry date)
# Expired or expiring within 30 days = fix immediately
# Check for mixed content (HTTP resources on HTTPS pages):
# Chrome DevTools: Console tab
# Filter by 'Mixed Content' or 'Blocked'
# Look for: 'Mixed Content: The page was loaded over HTTPS but...'
# Screaming Frog: Configuration > Check HTTP Links (for HTTPS sites)
# Identifies all internal links using HTTP instead of HTTPS
# These break HSTS headers and can trigger browser security warnings8.2 Server response headers
# Check security headers (best practice; not direct ranking factors
# but affect user trust signals and browser security):
curl -I https://yoursite.com/ | grep -E '(Strict-Transport|X-Frame|Content-Security|X-Content)',
# Recommended security headers:
# Strict-Transport-Security: max-age=31536000; includeSubDomains
# X-Frame-Options: SAMEORIGIN
# X-Content-Type-Options: nosniff
# Check HTTP/2 or HTTP/3 is enabled (required for performance):
curl -sI --http2 https://yoursite.com/ | head -1
# Expected: HTTP/2 200
# HTTP/1.1 on a modern site is a performance liability
# Check compression is enabled (reduces payload size significantly):
curl -sI --compressed https://yoursite.com/ | grep -i 'content-encoding'
# Expected: content-encoding: gzip or content-encoding: br (Brotli)
# Missing: HTML, CSS, JS are being sent uncompressedStep 9: JavaScript Rendering
JavaScript rendering issues are the most commonly missed category in standard SEO audits. They are invisible in a normal browser view and only visible when you look at what Google actually sees on the first crawl pass.
9.1 Rendered vs unrendered HTML check
# The fastest check: fetch raw HTML without JavaScript execution
curl -sA 'Googlebot/2.1' https://yoursite.com/ > raw_html.txt
# Then open the site in Chrome and copy the full page source
# Compare word count of meaningful text between the two
# Python: compare content between raw and rendered
from bs4 import BeautifulSoup
with open('raw_html.txt') as f:
raw = BeautifulSoup(f.read(), 'html.parser')
raw_text = raw.get_text(separator=' ', strip=True)
raw_words = len(raw_text.split())
print(f'Raw HTML word count: {raw_words}')
# If raw_words < 50 on a content page: severe JS rendering problem
# Page content is invisible to Googlebot on first crawl pass
# GSC URL Inspection: the most definitive check
# Enter any URL > Test live URL > View tested page > HTML tab
# This shows exactly what Google’s renderer sees
# Compare to browser view. Gap = invisible content9.2 Internal links in JavaScript
# Internal links rendered by JavaScript are not reliable for crawl discovery
# Check if navigation links are in raw HTML or JS-rendered:
curl -sA 'Googlebot/2.1' https://yoursite.com/ | grep -o '<a [^>]*href=[^>]*>' | head -20
# If this returns no links (or very few), the navigation is JS-rendered
# Googlebot may not follow these links reliably on first crawl
# Fix: ensure navigation, breadcrumbs, and in-content links
# are present in the server-side rendered HTML
# Next.js: links rendered with <Link> component are SSR-ready
# by default if using App Router or getStaticProps
# React CSR: links inside components that render client-side
# will NOT appear in initial HTML — high risk for crawl discovery9.3 Structured data in rendered output
# Structured data must be in server-rendered HTML
# Not injected by client-side JavaScript after page load
# Check: is JSON-LD in raw HTML or JS-rendered?
curl -sA 'Googlebot/2.1' https://yoursite.com/blog/post-slug/ | grep 'ld+json'
# If this returns the JSON-LD block: schema is server-rendered (good)
# If this returns nothing: schema is JS-injected (may not be seen on first crawl)
# Common mistake in React/Next.js:
// BAD: JSON-LD injected via useEffect (client-side only)
useEffect(() => {
const script = document.createElement('script');
script.type = 'application/ld+json';
script.text = JSON.stringify(schemaData);
document.head.appendChild(script);
}, []);
// GOOD: JSON-LD in <Head> (server-rendered)
import Head from 'next/head';
// In your page component:
<Head>
<script type='application/ld+json'>
{JSON.stringify(schemaData)}
</script>
</Head>Step 10: Search Console Signals and Quick Wins
Google Search Console contains data that no external tool can replicate: the actual queries driving impressions, the CTR by position, and Google’s own assessment of indexation and performance. The audit mines this data for immediate wins.
10.1 The CTR opportunity report
Pages that rank in positions 4–10 but have a below-average CTR are the fastest SEO wins available. The ranking exists. The traffic gap is a title tag and meta description problem. Fix the copy, and the traffic follows within days.
# Google Search Console: Performance > Search Results
# Enable: Impressions, Clicks, CTR, Position
# Filter: Position > 3 (exclude top 3 where CTR is expected to be high)
# Sort: CTR ascending
# Expected CTR benchmarks by position (approximate, varies by query type):
# Position 1: ~28-39%
# Position 2: ~15-20%
# Position 3: ~10-13%
# Position 4: ~7-9%
# Position 5: ~5-7%
# Position 6-10: 3-5%
# Flag: any page at position 4-6 with CTR under 3%
# These pages are ranking but not converting impressions to clicks
# Action: rewrite title tag and meta description
# New title should: answer the query intent, include a benefit, create curiosity
# Bulk CTR export from GSC via API:
# Use Google Search Console API or connect to Looker Studio
# Filter: impressions > 500 AND position < 20 AND CTR < 3%
# Export for prioritised rewriting list10.2 Search appearance and rich results
# GSC: Search Results > Search Type > drop 'Appearance' filter
# Check which pages are appearing in rich results (FAQ, Article, etc.)
# Compare to pages that SHOULD have rich results based on schema audit
# If a page has valid FAQPage schema but no FAQ rich result in GSC:
# - Check: did the rich result appear and then disappear (spam signal)?
# - Check: does the FAQ content on the page match what’s in schema?
# - Check: are there fewer than 2 or more than 10 Q&A pairs?
# GSC: Experience > Core Web Vitals
# Review: URLs by status (Good/Needs Improvement/Poor)
# Poor URLs: active ranking penalty, fix immediately
# Filter by device: mobile issues are more critical (mobile-first indexing)10.3 Index coverage audit by page type
# Advanced: compare expected vs actual index coverage
# From Screaming Frog: export all 200-status URLs with type annotation
# From GSC Sitemap report: total submitted vs total indexed
# If submitted = 60 URLs but indexed = 38:
# 22 URLs are not indexed — where are they?
# GSC: Indexing > Pages > Not Indexed > 'Crawled, currently not indexed'
# These pages passed crawl but Google chose not to index them
# Common causes:
# - Thin content (under ~300 words with no unique value)
# - Near-duplicate of another page (similar content, different URL)
# - Slow page speed triggering quality downgrade
# - Soft 404 (page returns 200 but content implies page is empty)
# Check for soft 404 pattern:
curl -s https://yoursite.com/nonexistent-page/ | grep -c '<p>'
# If this returns a high count despite non-existent page:
# The site is returning 200 for missing pages with templated content
# Fix: return 404 for genuinely missing pages; add unique content for thin pagesStep 11: Advanced Technical Checks
The following checks apply to specific site configurations: multilingual sites, large e-commerce catalogues, and sites using headless or API-driven architectures. Apply those relevant to the site being audited.
11.1 hreflang for multilingual/multiregional sites
<!-- hreflang implementation: tells Google which page to serve in which language/region -->
<link rel="alternate" hreflang="en-ng" href="https://yoursite.com/en/" />
<link rel="alternate" hreflang="en-gb" href="https://yoursite.com/uk/" />
<link rel="alternate" hreflang="x-default" href="https://yoursite.com/" />
<!-- Rules for correct hreflang: -->
<!-- 1. Every page in the set must reference ALL other pages in the set -->
<!-- 2. hreflang values must be valid BCP 47 language-region codes -->
<!-- 3. Every page must include a self-referencing hreflang entry -->
<!-- 4. x-default is required for the catch-all/homepage -->
# Common hreflang errors:
# - Non-canonicalized pages in hreflang set
# - hreflang annotations not reciprocated (A points to B but B doesn't point to A)
# - Wrong BCP 47 codes ('en_US' instead of 'en-us', or 'uk' instead of 'en-gb')
# Validate hreflang with Screaming Frog:
# Reports > hreflang > hreflang Languages
# Shows all hreflang sets and flags incomplete reciprocation11.2 Pagination and infinite scroll
# Pagination best practices (Google deprecated rel=prev/next in 2019):
# Google recommends: strong internal links between paginated pages
# Each paginated page should have unique content value
# For paginated pages that should NOT be indexed:
# Option 1: noindex on /page/2/ onwards (simplest)
# Option 2: canonical pointing /page/2/ to the root page (consolidates signals)
# For infinite scroll: Google cannot scroll to load content
# Ensure infinite scroll content has a paginated fallback:
# /products/?page=1, /products/?page=2 etc.
# These paginated URLs should be crawlable even if UX uses infinite scroll
# Check if paginated URLs are being indexed (they usually shouldn't be):
# GSC: Performance > search type > filter by '/page/' URL pattern
# If paginated archive URLs are ranking: add noindex and update internal links11.3 Duplicate content at scale
# Large sites commonly generate duplicate content through:
# - Faceted navigation (filters creating parameter URLs)
# - Sorted views (?sort=price-asc, ?sort=latest)
# - Printer-friendly versions (/print/article-slug/)
# - Mobile subdomain duplicating www content (m.yoursite.com)
# - Tag and category archive pages with same posts
# Find parameter-based duplicates:
# Screaming Frog: URL > filter for '?' in URL column
# Count parameter URLs vs canonical URLs
# Python: find near-duplicate content pages
import difflib
def check_similarity(content1, content2, threshold=0.85):
ratio = difflib.SequenceMatcher(None, content1, content2).ratio()
return ratio > threshold
# Use with Screaming Frog’s custom extraction to compare body text
# across pages in the same category or using similar templates
# Fix options for parameter-based duplicates:
# 1. Canonical tag pointing parameter URL to clean URL
# <link rel="canonical" href="/products/shoes/" />
# 2. noindex + follow on parameter URLs
# 3. Parameter handling in Google Search Console (legacy; less reliable)
# 4. Disallow parameter URLs in robots.txt (stops crawl but allows index if linked)Step 12: Writing the Audit Report
An audit is only as useful as the decisions it produces. The findings from the previous eleven steps need to be translated into a prioritised action list that a development team can execute against.
12.1 Prioritisation framework
Not every audit finding is equally urgent. Prioritise issues using two axes: severity (how much is this costing in rankings or traffic right now?) and effort (how long will this take to fix?).
# Priority matrix:
# P1: High severity, low effort — fix this week
# Examples: incorrect robots.txt, missing sitemap, canonical pointing to 404,
# entire site noindexed, HTTPS not enforced, CWV in 'Poor' range
# P2: High severity, medium effort — fix within 30 days
# Examples: JS rendering blocking content, redirect chains on key pages,
# structured data errors on all pages, orphaned service pages,
# duplicate title tags across high-value pages
# P3: Medium severity, high effort — schedule for next sprint
# Examples: image alt text gaps, URL structure cleanup, schema
# implementation on new page types, hreflang errors
# P4: Low severity — address opportunistically
# Examples: meta description improvements, H2 hierarchy fixes,
# minor page speed improvements, sitemap refresh
12.2 Report structure
A professional audit report contains five sections: an executive summary with the three most critical findings, a technical findings log with severity and effort ratings for each issue, a prioritised action plan, a baseline metrics snapshot (GSC impressions, CWV status, indexed URLs) for progress tracking, and an appendix with all raw data exports. The findings log should be specific enough for a developer to execute without needing a follow-up call.
- Audit report — suggested file structure:
- /audit-report/
- 01-executive-summary.docx
- 02-technical-findings.xlsx
- 03-prioritised-actions.csv
- 04-baseline-metrics.pdf (GSC screenshot + CWV report)
- 05-appendix/
- #screaming-frog-export.csv
- #pagespeed-results.json
- #redirect-chains.csv
- #structured-data-validation.txt
- #Each finding in 02-technical-findings.xlsx should include:
- #URL(s) affected | Issue description | Evidence | Priority | Fix action | Owner | Done
- # Executive summary format:
- # - Sites audited: [URL], [date]
- # - Critical findings: [count] P1, [count] P2, [count] P3
- # - Current index coverage: [X] / [Y] pages indexed
- # - Core Web Vitals status: [X]% URLs Good (mobile)
- # - Most urgent action: [one sentence]
The Complete Technical SEO Audit Checklist
The following checklist consolidates every check from this guide into a single reference. Priority codes: P1 = fix this week, P2 = fix within 30 days, P3 = schedule for next sprint.
| Check | Tool | Priority | Status |
|---|---|---|---|
| 1. Crawlability and Access | |||
| robots.txt is accessible (200 response) and contains no unintended Disallow rules | curl yoursite.com/robots.txt | P1 | ☐ |
| No critical content paths, image directories, or API routes are blocked | GSC robots.txt tester | P1 | ☐ |
| Sitemap is accessible, returns 200, and contains no redirect or error URLs | curl + Screaming Frog | P1 | ☐ |
| All sitemap URLs are canonical, indexable 200-status pages | Screaming Frog | P1 | ☐ |
| Crawl budget is not wasted on parameter URLs or pagination | GSC > Crawl Stats | P2 | ☐ |
| Key pages return 200; no authentication or server errors blocking crawl | curl -I | P1 | ☐ |
| No cloaking: same HTML served to Googlebot and regular users | curl -A Googlebot | P1 | ☐ |
| 2. Indexation | |||
| GSC Coverage report reviewed; all errors triaged | GSC > Indexing > Pages | P1 | ☐ |
| No important pages carry a noindex tag | Screaming Frog | P1 | ☐ |
| All pages have a self-referencing canonical tag | Screaming Frog | P1 | ☐ |
| Canonical tags point to live 200-status URLs (not 301 or 404) | Screaming Frog | P1 | ☐ |
| No duplicate or conflicting canonical signals (HTTP vs HTTPS, www vs non-www) | Screaming Frog | P2 | ☐ |
| 'Crawled, currently not indexed' pages investigated for quality/thin content issues | GSC | P2 | ☐ |
| 3. Site Architecture and URLs | |||
| All key pages reachable within 3 clicks from homepage | Screaming Frog > Crawl Depth | P2 | ☐ |
| Consistent trailing slash convention enforced with 301 on non-preferred variant | curl -I | P1 | ☐ |
| www and non-www redirect to single canonical root | curl -I | P1 | ☐ |
| HTTP redirects to HTTPS on all variants | curl -I | P1 | ☐ |
| No redirect chains (A -> B -> C); all chains collapsed to direct 301s | Screaming Frog | P2 | ☐ |
| No redirect loops | Screaming Frog | P1 | ☐ |
| No broken internal links (links pointing to 4xx or 5xx) | Screaming Frog | P2 | ☐ |
| No orphaned pages (key pages with 0 inbound internal links) | Screaming Frog | P2 | ☐ |
| Internal link anchor text is descriptive, not generic | Manual + Screaming Frog | P3 | ☐ |
| 4. Performance and Core Web Vitals | |||
| LCP field data in 'Good' range (<2.5s) on mobile | GSC > CWV + CrUX API | P1 | ☐ |
| CLS field data in 'Good' range (<0.1) | GSC > CWV + DevTools | P1 | ☐ |
| INP field data in 'Good' range (<200ms) | GSC > CWV + DevTools | P1 | ☐ |
| TTFB under 800ms on key pages | curl timing | P2 | ☐ |
| LCP element identified and served with fetchpriority='high' | DevTools Performance | P2 | ☐ |
| All images have explicit width and height attributes (CLS prevention) | Screaming Frog | P2 | ☐ |
| No render-blocking resources on critical path | PageSpeed Insights | P2 | ☐ |
| No main-thread animations using layout-triggering CSS properties | DevTools Performance | P3 | ☐ |
| Third-party scripts loaded with defer or after interaction | DevTools > Network | P2 | ☐ |
| 5. Mobile and Usability | |||
| viewport meta tag present on all pages | Screaming Frog | P1 | ☐ |
| No mobile usability errors in GSC | GSC > Experience > Mobile Usability | P1 | ☐ |
| Tap targets at least 44x44px on mobile | Lighthouse > Accessibility | P2 | ☐ |
| No content wider than screen (no horizontal scroll) | DevTools device emulation | P2 | ☐ |
| Content parity between mobile and desktop renders | Screaming Frog (two crawls) | P2 | ☐ |
| Structured data present in mobile rendered HTML | GSC URL Inspection | P2 | ☐ |
| 6. On-Page Technical Signals | |||
| Every page has exactly one unique H1 tag | Screaming Frog | P1 | ☐ |
| Every key page has a unique, keyword-informed title tag under 60 chars | Screaming Frog | P1 | ☐ |
| No duplicate title tags across key pages | Screaming Frog | P1 | ☐ |
| Every page has a unique meta description under 155 chars | Screaming Frog | P2 | ☐ |
| All images have descriptive alt text (decorative images have alt='') | Screaming Frog | P2 | ☐ |
| No images over 150KB on content pages | Screaming Frog | P2 | ☐ |
| All images in WebP or AVIF format (not JPEG/PNG for photos) | Screaming Frog | P3 | ☐ |
| 7. Structured Data | |||
| Organisation or LocalBusiness schema on homepage | Rich Results Test | P1 | ☐ |
| Article or BlogPosting schema on all blog posts | Screaming Frog extraction | P2 | ☐ |
| BreadcrumbList schema on all interior pages | Rich Results Test | P2 | ☐ |
| FAQPage schema on applicable pages with Q&A content | Rich Results Test | P2 | ☐ |
| All schema blocks validated (no syntax errors, required properties present) | Rich Results Test | P1 | ☐ |
| Schema present in server-rendered HTML (not JS-injected) | curl | grep ld+json | P1 | ☐ |
| dateModified property up to date on refreshed articles | Manual review | P3 | ☐ |
| 8. HTTPS and Security | |||
| SSL certificate valid and not expiring within 30 days | openssl s_client | P1 | ☐ |
| All four root variants (http/https x www/non-www) redirect to canonical | curl -I | P1 | ☐ |
| No mixed content warnings (HTTP resources on HTTPS pages) | Chrome DevTools Console | P1 | ☐ |
| HTTP/2 enabled (not HTTP/1.1) | curl --http2 | P2 | ☐ |
| GZIP or Brotli compression enabled | curl --compressed | P2 | ☐ |
| HSTS header present | curl -I | grep Strict | P3 | ☐ |
| 9. JavaScript Rendering | |||
| Key page content present in raw HTML (not JS-rendered only) | curl -A Googlebot + wc | P1 | ☐ |
| Navigation links present in raw HTML | curl | grep '<a href' | P1 | ☐ |
| Structured data in server-rendered HTML, not JS-injected | curl | grep ld+json | P1 | ☐ |
| GSC URL Inspection rendered HTML matches browser view | GSC URL Inspection | P2 | ☐ |
| No client-side redirects (window.location) on indexed pages | Manual review | P2 | ☐ |
| 10. Search Console Signals | |||
| CTR opportunity audit: positions 4-10 with CTR below benchmark | GSC Performance | P2 | ☐ |
| Rich result eligibility reviewed against schema coverage | GSC > Search Appearance | P2 | ☐ |
| No manual penalties in GSC Security & Manual Actions | GSC > Security | P1 | ☐ |
| Core Web Vitals report reviewed by URL group | GSC > Experience | P1 | ☐ |
| Top 20 keywords reviewed for intent alignment with landing pages | GSC Performance | P2 | ☐ |
Running the Audit Efficiently
A full technical SEO audit of a small to medium site (50–300 pages) takes three to five hours with the tools described in this guide. Larger sites with complex architectures take proportionally longer. The time investment concentrates in three areas: running and interpreting the Screaming Frog crawl, mining GSC for coverage and performance signals, and validating JavaScript rendering on key pages.
The order of the steps in this guide reflects the order a practitioner should work through them. Crawlability and indexation issues at Steps 1 and 2 are the highest-leverage findings: a site with crawl or indexation problems is one where all subsequent optimisation effort is partially wasted. Fix the access issues first. Then fix the architecture. Then fix the page-level signals. Performance and structured data compound on top of a clean foundation.
The checklist at the end of this guide is designed to be used on every audit. Work through every item. Flag, don’t skip. An item that passes takes thirty seconds to confirm. An item that reveals a critical issue saves months of unexplained ranking plateau.
Need a technical SEO audit done for you? hello@semoladigita.com
Semola Digital conducts full technical SEO audits as a standalone engagement and as the Month 1 deliverable in all retainer engagements. The audit deliverable includes: full Screaming Frog export, prioritised findings report, GSC baseline snapshot, and a 60-minute walkthrough call.
Frequently Asked Questions
Questions readers ask about this topic
The FAQs below are pulled directly from this article's structured content and are designed to help readers quickly find answers to common questions related to the topic.
What is a technical SEO audit?
How often should you do a technical SEO audit?
What are Core Web Vitals and why do they matter for SEO?
Should I block AI bots in my robots.txt during a technical audit?
What is the difference between crawled and indexed in Google Search Console?
How do I check if my website is being crawled by Google?
What is the December 2025 rendering update and how does it affect technical audits?
How do I improve my website's LCP score?

Founder, Technical Analyst
Oladoyin Falana is a certified digital growth strategist and full-stack web professional with over five years of hands-on experience at the intersection of SEO, web design & development. His journey into the digital world began as a content writer — a foundation that gave him a deep, instinctive understanding of how keywords, content and intent drive organic visibility. While honing his craft in content, he simultaneously taught himself the building blocks of the modern web: HTML, CSS, and React.js — a pursuit that would eventually evolve into full-stack Web Development and a Technical SEO Analyst.
Follow me on LinkedIn →