Technical SEO• 8 min read

How to Do a Technical SEO Audit: A Step-by-Step Guide

Oladoyin Falana

May 30, 2026

Reviewed bySemola Digital Content Team

What a Technical SEO Audit Actually Is

A technical SEO audit is not a keyword review. It is not a content strategy. It is the systematic examination of a website’s technical infrastructure to identify every factor that is preventing search engines from finding, reading, and ranking the site correctly.

A site can have excellent content, strong backlinks, and a well-structured keyword strategy — and still rank poorly because a misconfigured robots.txt file is blocking Googlebot from crawling the most important pages, JavaScript is rendering content that Google cannot see on the first crawl pass, or because fifteen years of redirects have accumulated into chains that bleed authority at every hop.

Technical SEO is the foundation. Content and authority are the structure built on top of it. A weak foundation limits the ceiling of everything else, regardless of how much effort goes into the upper layers.

This guide is written for practitioners who want to run a thorough, methodical audit. It covers nine audit areas in sequence, each with the specific questions to ask, the tools to use, the code to run, and the criteria for a pass or fail judgment.

The master checklist at the end consolidates every item into a single reference document.

Step 1: Crawlability and Access

The first question in any audit is the most fundamental: can Google get in? A site that blocks crawlers entirely, or blocks them from significant sections, is invisible to search regardless of how excellent the content is. This step checks everything that determines whether Googlebot can access the site.

Crawling site with screaming frog seo tool

1.1 robots.txt

The robots.txt file lives at yoursite.com/robots.txt. It is the first thing Googlebot reads when it visits a site. It can allow or disallow access to specific paths. A single misconfigured line can silently block crawling of the entire site.

Check the robots.txt manually by visiting the URL. Then validate it in Google Search Console under Settings > robots.txt.

Common errors: blocking /wp-content/ (prevents image and asset crawling), blocking /api/ on headless sites (prevents content delivery path crawling), and leaving staging-era Disallow: / in place after a migration.

Audit action: Fetch yoursite.com/robots.txt. Check every Disallow line. Verify that no important content paths, image directories, or API routes are blocked. Test specific URLs against the robots.txt using GSC’s robots.txt tester.

1.2 XML Sitemap

The XML sitemap tells Google which pages exist and should be indexed. A poorly maintained sitemap is one of the most common and most impactful technical issues on established sites.

Verify sitemap is accessible

# Verify sitemap is accessible

curl -I https://yoursite.com/sitemap.xml

# Expected response: HTTP/2 200
# Red flags: 404 (sitemap missing), 301 (redirect chain), 403 (blocked)
# Check sitemap index (common on large sites)

curl https://yoursite.com/sitemap_index.xml

# Validate sitemap structure
# A valid XML sitemap contains <urlset>, <url>, <loc>, and optional# <lastmod>, <changefreq>, <priority> tags
# Screaming Frog: Mode > Spider > Crawl > Sitemaps
# Lists all URLs in sitemap and flags issues:
# - URLs in sitemap that return 4xx errors
# - URLs in sitemap that are noindexed
# - Sitemap URLs not found in crawl
# - Non-canonical URLs in sitemap

text

The most common sitemap errors includes: noindexed pages, redirected URLs instead of their final destinations, paginated archive pages that should be canonicalised to the root, and staging URLs that were accidentally pushed to production.

Audit action: Submit the sitemap in GSC and review the Coverage report. A sitemap should contain only canonical, indexable, 200-status URLs. Every page in the sitemap should be a page you want indexed. Every page you want indexed should be in the sitemap.

1.3 Crawl budget

Crawl budget is the number of pages Googlebot crawls on your site within a given timeframe. On small sites (under 1,000 pages), crawl budget is rarely a constraint. On large sites, it can mean that important new content is not being indexed promptly because Googlebot is spending budget crawling low-value pages.

Crawl Budget in GSC

# Google Search Console: Settings > Crawl Stats
# Review: Total crawl requests, response breakdown, file type breakdown
# High crawl budget waste indicators:
# - Large number of 404 crawl requests (fix: 301 redirect or remove internal links)
# - Large number of redirect crawl requests (fix: update internal links to final URL)
# - Many /page/2/, /page/3/ archive pagination pages being crawled
# - Faceted navigation URLs with parameters being crawled
# Check for parameter-based URL proliferation:

curl 'https://yoursite.com/products/?sort=price&color=blue&size=M'

# If this returns 200 and is indexable, multiply by all filter combinations
# = potentially thousands of indexable near-duplicate URLs

# Fix: Use robots.txt Disallow rules to block faceted variations from being crawled, or implement standard noindex tags on filter combinations to keep them out of the index.

text

1.4 Site access check

Before every audit, run a basic server response check on the most important pages. This confirms the site is returning the correct HTTP status codes and that there are no authentication layers blocking crawlers.

Check HTTP response codes for key pages

# Check HTTP response codes for key pages

curl -I -L https://yoursite.com/
curl -I -L https://yoursite.com/services/
curl -I -L https://yoursite.com/blog/

# The -L flag follows redirects. Check the final status code.
# Expected: HTTP/2 200
# Red flag: HTTP/1.1 401 (authentication), 403 (forbidden), 503 (server error)
# Simulate Googlebot to check for cloaking:

curl -A 'Googlebot/2.1 (+http://www.google.com/bot.html)' https://yoursite.com/

# Compare HTML returned to what a regular browser sees
# If different: cloaking — a severe manual penalty risk

text

Step 2: Indexation

Crawlability determines whether Google can access the site. Indexation determines whether Google chooses to include the pages it finds in the search index. These are separate decisions, and both can fail independently.

2.1 Coverage report analysis

Google Search Console’s Pages report (formerly Coverage) is the primary indexation diagnostic. It classifies every URL Google has encountered into four categories.

Page's coverage report in search console

GSC Coverage Analysis

# Google Search Console: Indexing > Pages
# Four categories to review:
# 1. Error (red): URLs with a blocking issue
#    - Server error (5xx): hosting or server problem
#    - Redirect error: broken redirect chain
#    - Not found (404): page deleted with no redirect
# 2. Valid with warning (amber):
#    - Indexed, though blocked by robots.txt (crawl/index conflict)
#    - These pages ARE indexed despite robots.txt blocking
# 3. Valid (green):
#    - Indexed and in search results
# 4. Not indexed (grey — the diagnostic goldmine):
#    - Crawled, currently not indexed: Google visited but chose not to index
#    - Discovered, currently not indexed: Google found but hasn't visited yet
#    - Excluded by 'noindex' tag: intentional exclusion
#    - Duplicate without user-selected canonical: URL duplication problem
#    - Alternate page with proper canonical tag: correct canonicalisation
# Key audit action: click 'Crawled, currently not indexed'
# These pages passed crawl but failed Google's quality threshold
# Common causes: thin content, duplicate content, slow load, unclear purpose

text

2.2 noindex audit

The noindex meta tag is the most powerful indexation control on a page. It should be present on pages you explicitly want excluded from the index. It should never appear on pages you want to rank.

Correct noindex on a utility page

<!-- Correct: noindex on a utility page --><meta name="robots" content="noindex, follow">
<!-- Also accepted via X-Robots-Tag HTTP header (for non-HTML files) -->
# Server response header:X-Robots-Tag: noindex
# Screaming Frog: Configuration > Spider > Extraction
# Add custom extraction for: //meta[@name='robots']/@content
# This lists every page's robots meta tag in a column
# Sort to find unexpected noindex on important pages
# Common mistakes:
# - WordPress: plugin accidentally noindexes entire site during development
# - Paginated pages: /blog/page/2/ through /blog/page/50/ noindexed correctly
#   but so are /services/seo/ and /about/ by accident
# - Staging URL regex accidentally includes production patterns

text

2.3 Canonical tags

Canonical tags tell Google which URL is the authoritative version of a page when multiple URLs serve the same or similar content. A self-referencing canonical on every page is defensive SEO best practice. Missing or incorrect canonicals are a primary source of duplicate content dilution.

Canonical tags audit and fixes

<!-- Self-referencing canonical: correct implementation -->
<link rel="canonical" href="https://yoursite.com/services/seo-lagos/" />

<!-- Common canonical errors: -->

<!-- 1. Canonical pointing to a 404 -->
<link rel="canonical" href="https://yoursite.com/old-services/" />
<!-- Fix: update to point to the correct live URL -->

<!-- 2. Canonical pointing to a redirect -->
<link rel="canonical" href="https://yoursite.com/services/seo/" />
<!-- If /seo/ redirects to /seo-lagos/, canonical should point to /seo-lagos/ -->

<!-- 3. HTTP canonical on an HTTPS page -->
<link rel="canonical" href="http://yoursite.com/services/" />
<!-- Fix: canonical must exactly match the preferred HTTPS URL -->

<!-- 4. Paginated pages all canonicalising to page 1 -->
<!-- /blog/page/3/ canonical = /blog/ -->
<!-- This collapses all paginated content into one page for Google -->
<!-- Better: Ensure self-referencing canonicals on each paginated page (e.g., /blog/page/2 canonicalizes to /blog/page/2) to allow Google to discover deep links, or use a noindex, follow tag if you only want the root page indexed. -->

# Screaming Frog: Directives > Canonical
# Shows canonical URL
# Shows canonical URL for every page and flags mismatches

text

Step 3: Site Architecture and URL Structure

Site architecture is the set of decisions that determines how pages relate to each other and how authority flows through the site. Poor architecture is invisible in content audits and invisible in basic technical checks, but its effect on ranking performance is cumulative and compounding.

3.1 Click depth

Click depth is the number of clicks required to reach any page from the homepage. Pages more than three to four clicks deep receive lower crawl priority and accumulate authority slowly. A well-structured site keeps all commercially important pages within three clicks.

Click depth check

# Screaming Frog: Crawl Data > URL > Sort by 'Crawl Depth'
# Target: key service/product pages within depth 2-3
# Red flag: important pages at depth 5+

# Python: check crawl depth distribution from Screaming Frog export
import pandas as pd

df = pd.read_csv('screaming_frog_export.csv')
depth_counts = df['Crawl Depth'].value_counts().sort_index()

# Flag pages you care about that are too deep:
important_pages = df[df['Address'].str.contains('/services/|/products/')]
deep_important = important_pages[important_pages['Crawl Depth'] > 3]
print(deep_important[['Address', 'Crawl Depth', 'Inlinks']].to_string())

python

3.2 URL structure audit

URLs should be descriptive, consistent, and clean. The technical audit checks for URL patterns that fragment authority or create indexation problems.

URL structure audit

# URL anti-patterns to identify in Screaming Frog export:
# 1. Session IDs in URLs (create duplicate content at scale)
# https://yoursite.com/product/?session=abc123xyz
# 2. Parameter proliferation (faceted navigation)
# https://yoursite.com/shoes/?colour=red&size=42&sort=price-asc
# 3. Inconsistent trailing slash usage
# Both /services/ and /services returning 200 = duplicate content
# 4. Uppercase characters in URLs
# /Services/SEO/ and /services/seo/ may index as different URLs
# 5. Dates in content URLs (implies content ageing)
# /blog/2021/03/15/seo-guide/ vs /blog/seo-guide/
# Check for trailing slash consistency:

curl -I https://yoursite.com/services
curl -I https://yoursite.com/services/

# One should return 200, the other should 301 to the preferred version
# Check for www/non-www canonicalisation:

curl -I http://www.yoursite.com/
curl -I http://yoursite.com/

# Both should ultimately 301 to https://yoursite.com/ (or www version)

text

3.3 Redirect audit

Redirects accumulate over time. Every site that has undergone a migration, a URL restructure, or a CMS change carries redirect debt. The audit identifies chains, loops, and broken redirects.

Redirect audit

# Screaming Frog: Response Codes > Redirection 3xx
# Then: Reports > Redirects > Redirect Chains

# Redirect chain example (each hop loses ~5-10% link equity):
# /old-page/ -> 301 -> /interim-page/ -> 301 -> /new-page/
# Fix: update source to point directly to /new-page/

# Check redirect chain length from command line:
curl -sIL -w '%{url_effective}\n' -o /dev/null https://yoursite.com/old-page/

# Full redirect chain trace:
curl -v --max-redirs 10 https://yoursite.com/old-page/ 2>&1 | grep -E '(< HTTP|Location:)'

# Expected output for a clean redirect:
# < HTTP/1.1 301 Moved Permanently
# Location: https://yoursite.com/new-page/
# < HTTP/2 200

# Red flag output (chain):
# < HTTP/1.1 301
# Location: /interim/
# < HTTP/1.1 302
# Location: /new-page/
# < HTTP/2 200
# Also: 302 (temporary) used where 301 (permanent) is correct

text

3.4 Internal link audit

Internal links are the mechanism through which authority flows between pages. This section checks for orphaned pages (no inbound internal links), broken internal links (pointing to 404 or redirected URLs), and anchor text patterns.

Internal links auditing

# Screaming Frog: Internal > Filter by 'Inlinks' column
# Sort ascending: pages with 0 inbound internal links = orphaned

# Export internal links for anchor text analysis:
# Reports > Inlinks > export CSV

# Check broken internal links:
# Response Codes > Client Error (4xx)
# Then: check 'Inlinks' tab on any 4xx URL to find the source pages

# Python: find orphaned pages by type
df = pd.read_csv('screaming_frog_export.csv')

# Filter to important page types
service_pages = df[
    df['Address'].str.contains('/services/|/products/') &
    (df['Status Code'] == 200)
]

# Flag those with no or very few inlinks
underlinked = service_pages[service_pages['Inlinks'] < 2]
print(f'Underlinked service pages: {len(underlinked)}')
print(underlinked[['Address', 'Inlinks', 'Title 1']].to_string())

python

Step 4: Page Speed and Core Web Vitals

Core Web Vitals are a confirmed Google ranking signal. Since 2021, LCP, CLS, and INP have directly influenced where pages rank. The audit measures field data (real user experience) separately from lab data (simulated performance), and treats them differently.

4.1 Field data vs lab data

PageSpeed Insights runs both a Lighthouse lab test and pulls CrUX (Chrome User Experience Report) field data. The field data is what Google uses for rankings. The lab data is useful for diagnosis but does not directly affect rankings.

Page speed insights: Field data vs lab data

# Check CrUX field data programmatically via the CrUX API
# Requires: Google API key (free)

curl -X POST \ 
  'https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  --data '{
    "origin": "https://yoursite.com",
    "formFactor": "PHONE",
    "metrics": ["largest_contentful_paint", "cumulative_layout_shift", "interaction_to_next_paint"]
  }'

# Response includes p75 (75th percentile) values — what Google measures
# Good thresholds:
# LCP  : <= 2500ms
# CLS  : <= 0.1
# INP  : <= 200ms

# If field data shows Poor: this is an active ranking penalty
# If field data shows Needs Improvement: partial penalty
# If Lighthouse lab passes but field data fails: look at real user conditions
# (slow devices, slower connections) not just your fast laptop

text

4.2 LCP diagnosis

Largest Contentful Paint measures the loading of the largest visible element. Diagnosing a poor LCP requires identifying which element is the LCP element and which resource on the critical path is delaying it.

LCP Diagnosing+ Fixes

// DevTools: Performance tab > Record page load
// Look for the 'LCP' marker in the timeline
// The element identified is what you need to optimise

// JavaScript: measure LCP in browser console
new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    console.log('LCP element:', entry.element);
    console.log('LCP time (ms):', entry.startTime);
  }
}).observe({ type: 'largest-contentful-paint', buffered: true });

// Common LCP causes and fixes:
// 1. Hero image without fetchpriority
//    Fix: <img src="hero.jpg" fetchpriority="high" loading="eager">

// 2. Render-blocking CSS or fonts
//    Fix: <link rel='preload'> critical CSS; font-display: swap

// 3. Slow server TTFB delaying everything
//    Fix: CDN, server-side caching, hosting upgrade

// Check TTFB specifically:
curl -w '@curl-format.txt' -o /dev/null -s https://yoursite.com/
# curl-format.txt: time_starttransfer: %{time_starttransfer}\n
# Target TTFB: under 800ms; ideally under 200ms from CDN edge

javascript

4.3 CLS diagnosis

Cumulative Layout Shift measures visual stability. Identifying CLS sources requires watching the page load on a slow connection with the Layout Shift Regions tool in DevTools.

CLS Diagnosis and Fixes

// DevTools: Rendering > Layout Shift Regions (check box)
// Blue flash = layout shift occurring at that element

// JavaScript: measure and log CLS in console
let clsValue = 0;
new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (!entry.hadRecentInput) {
      clsValue += entry.value;
      console.log('CLS entry:', entry.value, 'Sources:', entry.sources);
    }
  }
  console.log('Cumulative CLS:', clsValue);
}).observe({ type: 'layout-shift', buffered: true });

// Most common CLS causes:
// 1. Images without width and height attributes
//    Fix: <img src="photo.jpg" width="800" height="600" alt="...">

// 2. Web fonts causing text reflow on swap
//    Fix: font-display: optional (no swap) or size-adjust on fallback

// 3. Dynamically injected banners or ads above content
//    Fix: reserve exact height before injection with min-height CSS

// 4. Late-loading iframes (cookie banners, chat widgets)
//    Fix: reserve space or load below viewport

javascript

4.4 INP diagnosis

Interaction to Next Paint replaced First Input Delay in March 2024. It measures the worst interaction responsiveness across the full page session. It is harder to pass than FID and requires more investigation.

Interaction to next paint

// DevTools: Performance > record user interactions
// Look for long tasks (red bars) in the main thread
// These block interaction processing

// Identify long tasks programmatically:
new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    console.log('Long task:', entry.duration + 'ms at', entry.startTime);
  }
}).observe({ type: 'longtask', buffered: true });

// Target: no single long task over 50ms on interaction path

// Common INP causes:
// 1. Main thread animations (width/height vs transform/opacity)
//    Bad:  .element { transition: width 300ms; }
//    Good: .element { transition: transform 300ms; }

// 2. Heavy third-party scripts (chat widgets, analytics)
//    Fix: load third parties with defer or after user interaction

// 3. Synchronous event handlers on click/tap
//    Fix: use scheduler.postTask() or requestAnimationFrame to defer
//    non-critical work out of the interaction

javascript

Step 5: Mobile and Usability

Google uses mobile-first indexing for all sites. The mobile version of the site is what Google primarily uses for indexing and ranking. A site that delivers a good desktop experience but a poor mobile experience is ranked on the poor experience.

5.1 Mobile usability check

Mobile usability

# Google Search Console: Experience > Mobile Usability
# Common errors flagged:
# - Clickable elements too close together (tap targets under 48px)
# - Content wider than screen (horizontal scroll required)
# - Text too small to read (under 12px effectively)
# - Viewport not configured

# Check viewport meta tag in page source:
curl -s https://yoursite.com/ | grep -i viewport
# Expected: <meta name='viewport' content='width=device-width, initial-scale=1'>
# Missing viewport = mobile usability fail

# DevTools: Toggle Device Toolbar (Cmd+Shift+M / Ctrl+Shift+M)
# Test on Moto G4 (mid-range Android) not just iPhone Pro
# Nigerian mobile market is predominantly mid-range Android
# What looks fine on a high-end device may fail on budget hardware

# Check tap target sizes:
# DevTools > Lighthouse > Accessibility > Tap targets
# Ensure all interactive mobile elements are at least 48x48 pixels in size, with at least 8 pixels of space between them to pass Google's accessibility and mobile usability standards.
# Buttons with padding: { padding: 12px; min-height: 44px; min-width: 44px; }

text

5.2 Mobile vs desktop parity

With mobile-first indexing, any content that is present on desktop but absent on mobile will not be indexed. Check that critical content, navigation, and structured data are present on the mobile render.

Desktop vs mobile parity

# Compare mobile and desktop source:
# Screaming Frog: Configuration > User Agent > Googlebot Smartphone
# Run a second crawl in smartphone mode
# Export both crawls and compare word counts on key pages
# Significant word count difference (mobile < desktop) = content hidden on mobile
# GSC URL Inspection: test any URL
# 'View tested page' > 'Screenshot' tab: shows mobile render
# 'View tested page' > 'More info' > 'HTML' tab: shows rendered HTML
# Check that structured data, headings, and key content appear in HTML
# Check that structured data is not conditionally rendered:
# Verify JSON-LD block is in <head> or <body> on both renders
curl -A 'Googlebot-Mobile' https://yoursite.com/ | grep 'ld+json'
# Should return the full structured data block

text

Step 6: On-Page Technical Signals

On-page technical signals are the metadata and structural elements that tell search engines what each page is about. This step audits every page-level technical signal across the site for completeness, uniqueness, and correctness.

6.1 Title tags

auditing missing title tag

# Screaming Frog export: Page Titles column
# Check for:

# 1. Missing title tags
# Screaming Frog: Page Titles > Filter 'Missing'

# 2. Duplicate title tags (dilutes topical specificity signal)
# Screaming Frog: Page Titles > Filter 'Duplicate'

# 3. Title tags too long (Truncated in desktop SERPs at 600 pixels (roughly 55–60 characters). Keep critical keywords at the front of the title to prevent essential context from being clipped.)
# Screaming Frog: Page Titles > Filter 'Over 60 Characters'

# 4. Title tags too short (under 30 chars = opportunity wasted)
# Screaming Frog: Page Titles > Filter 'Below 30 Characters'

# 5. Title tags not containing primary keyword for the page
# Manual review: export titles with URLs, check against keyword strategy

# Ideal title tag formula for a service page:
# [Primary Keyword] | [Secondary Keyword] | [Brand] — [Location if local]
# 'Technical SEO Audit Services | SEO Agency Lagos | Semola Digital'
# Length: 55-60 characters

text

6.2 Meta descriptions

auditing meta description

# Meta descriptions are not a ranking factor but affect CTR significantly
# Screaming Frog: Meta Description column

# Check for: missing, duplicate, too long (>155 chars), too short (<70 chars)

# Write meta descriptions that:
# - Summarise the page in the user's language
# - Include the primary keyword naturally
# - Contain a benefit or call to action
# - Are unique per page

# Screaming Frog: bulk export meta descriptions to spreadsheet
# Sort by length to quickly identify missing and oversized descriptions
# Flag pages with impressions > 1000 in GSC but CTR < 2%
# These are candidates for meta description rewriting

text

6.3 Header tag structure

Header tags analysis

# Screaming Frog: H1 column
# Check for:

# 1. Missing H1 (no primary topical signal on the page)
# 2. Multiple H1 tags (ambiguous topical signal)
# 3. H1 does not match title tag intent (missed keyword opportunity)
# 4. H2/H3 hierarchy skipped (H1 then H4 directly)

# Check H1 programmatically:
curl -s https://yoursite.com/services/ | grep -i '<h1'

# Expected: one <h1> tag containing the primary keyword
# Example: <h1>Technical SEO Services in Lagos</h1>

# Screaming Frog: Crawl Analysis > H1 Tags report
# Export to CSV, filter for pages with 0 H1s or 2+ H1s

# Heading hierarchy rule:
# H1: primary topic of the page (one only)
# H2: major sections within the page
# H3: sub-sections within H2 sections
# Never skip levels (H1 > H3 without H2 in between)

text

6.4 Image optimisation

Image optimization

# Screaming Frog: Images tab
# Check for:

# 1. Missing alt text
# Images > Filter 'Missing Alt Text'
# All images that convey content need descriptive alt text
# Decorative images: alt='' (empty, not missing)

# 2. Images too large (slow LCP, unnecessary bandwidth)
# Images > Filter 'Over X kB' — set threshold at 150KB
# Hero/LCP images: target <150KB at 1200px wide in WebP format

# 3. Images missing width/height attributes (causes CLS)
# Check in Screaming Frog Image report or in page source

# 4. Images not in modern format (JPEG/PNG instead of WebP/AVIF)
# Screaming Frog: Images > Content Type column

# Quick check: count images missing alt text on a key page
curl -s https://yoursite.com/ | grep -o '<img[^>]*>' | grep -v 'alt=' | wc -l
# Output: number of images with no alt attribute at all

text

Step 7: Structured Data

Structured data audit covers two questions: is the correct schema present on every page that should have it, and is the schema that is present valid and error-free?

7.1 Schema coverage audit

Schema coverage audit

# Screaming Frog: Configuration > Custom Extraction
# Add: XPath // script[@type='application/ld+json']
# This extracts the full JSON-LD content from every page

# Export and review: which pages have schema? Which should but don't?

# Minimum schema requirements by page type:
# Homepage:       Organization or LocalBusiness + WebSite
# Service pages:  Service or ProfessionalService
# Blog posts:     Article or BlogPosting
# All pages:      BreadcrumbList
# FAQ content:    FAQPage
# How-to content: HowTo

Note on FAQ Schema: Google has heavily restricted FAQ rich results to authoritative government and health sites. Instead of investing resources into general FAQPage markup, prioritize Product, Article, or LocalBusiness schema which continue to yield reliable rich results.

# Check for schema on a specific page:
curl -s https://yoursite.com/services/seo/ | python3 -c "
import sys, json, re
html = sys.stdin.read()
blocks = re.findall(r'<script[^>]*type=[\"\']application/ld\+json[\"\'][^>]*>(.*?)</script>', html, re.DOTALL)
for i, block in enumerate(blocks):
    print(f'Schema block {i+1}:', json.loads(block.strip()).get('@type', 'unknown'))
"
# Output lists each schema block and its @type

python

7.2 Schema validation

Schema validation

# Validate structured data against Google's requirements:
# Tool: search.google.com/test/rich-results
# Enter URL or paste JSON-LD directly

# Command line validation (schema.org validator):
# Install: npm install -g schema-inspector

# Check for common validation errors:

# 1. Missing required properties
{ 
  "@context" : "https://schema.org" , 
  "@type" : "Article" , 
  "headline" : "SEO Guide" 
} 
// Note: While valid and eligible for rich results, it is highly recommended to include 'author', 'image', 'datePublished', and 'publisher' to maximize visibility and trust signals.

# 2. datePublished in wrong format (must be ISO 8601)
# Wrong:   "datePublished": "14th July 2025"
# Correct: "datePublished": "2025-07-14"

# 3. Nested entity missing @type
// Wrong: Missing the internal @type object
"author": {
  "name": "Oladoyin Falana"
}

// Correct: Explicitly nested Person entity
"author": {
  "@type": "Person",
  "name": "Oladoyin Falana"
}
# 4. FAQPage answers not matching visible page content
# Validate: search for the answer text in the page source

json

Step 8: HTTPS, Security, and Server Configuration

HTTPS has been a ranking signal since 2014. The audit checks for HTTPS implementation completeness, SSL certificate validity, and mixed content issues that can undermine a technically correct HTTPS migration.

8.1 HTTPS implementation

HTTPS implementation

# Check all four root variants resolve to one canonical HTTPS URL:
curl -sI http://yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI http://www.yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI https://www.yoursite.com/ | grep -E '(HTTP|Location)'
curl -sI https://yoursite.com/ | grep -E '(HTTP|Location)'

# Expected: all non-preferred variants return 301 to the canonical
# e.g.: HTTP/1.1 301 -> Location: https://yoursite.com/

# Check SSL certificate validity and expiry:
echo | openssl s_client -connect yoursite.com:443 2>/dev/null | openssl x509 -noout -dates
# Output: notBefore and notAfter (expiry date)
# Expired or expiring within 30 days = fix immediately

# Check for mixed content (HTTP resources on HTTPS pages):
# Chrome DevTools: Console tab
# Filter by 'Mixed Content' or 'Blocked'
# Look for: 'Mixed Content: The page was loaded over HTTPS but...'

# Screaming Frog: Configuration > Check HTTP Links (for HTTPS sites)
# Identifies all internal links using HTTP instead of HTTPS
# These break HSTS headers and can trigger browser security warnings

text

8.2 Server response headers

Server response headers

# Check security headers (best practice; not direct ranking factors
# but affect user trust signals and browser security):

curl -I https://yoursite.com/ | grep -E '(Strict-Transport|X-Frame|Content-Security|X-Content)'

# Recommended security headers:
# Strict-Transport-Security: max-age=31536000; includeSubDomains
# X-Frame-Options: SAMEORIGIN
# X-Content-Type-Options: nosniff

# Check HTTP/2 or HTTP/3 is enabled (required for performance):
curl -sI --http2 https://yoursite.com/ | head -1
# Expected: HTTP/2 200
# HTTP/1.1 on a modern site is a performance liability

# Check compression is enabled (reduces payload size significantly):
curl -sI --compressed https://yoursite.com/ | grep -i 'content-encoding'
# Expected: content-encoding: gzip or content-encoding: br (Brotli)
# Missing: HTML, CSS, JS are being sent uncompressed

text

Step 9: JavaScript Rendering

JavaScript rendering issues are the most commonly missed category in standard SEO audits. They are invisible in a normal browser view and only visible when you look at what Google actually sees on the first crawl pass.

9.1 Rendered vs unrendered HTML check

Rendered vs unrendered HTML check

# The fastest check: fetch raw HTML without JavaScript execution
curl -sA 'Googlebot/2.1' https://yoursite.com/ > raw_html.txt

# Then open the site in Chrome and copy the full page source
# Compare word count of meaningful text between the two

# Python: compare content between raw and rendered
from bs4 import BeautifulSoup

with open('raw_html.txt') as f:
    raw = BeautifulSoup(f.read(), 'html.parser')

raw_text = raw.get_text(separator=' ', strip=True)
raw_words = len(raw_text.split())

print(f'Raw HTML word count: {raw_words}')
# If raw_words < 50 on a content page: severe JS rendering problem
# Page content is invisible to Googlebot on first crawl pass

# GSC URL Inspection: the most definitive check
# Enter any URL > Test live URL > View tested page > HTML tab
# This shows exactly what Google’s renderer sees
# Compare to browser view. Gap = invisible content

python

9.2 Internal links in JavaScript

Internal links in JavaScript

# Internal links rendered by JavaScript are not reliable for crawl discovery
# Check if navigation links are in raw HTML or JS-rendered:

curl -sA 'Googlebot/2.1' https://yoursite.com/ | grep -o '<a [^>]*href=[^>]*>' | head -20

# If this returns no links (or very few), the navigation is JS-rendered
# Googlebot may not follow these links reliably on first crawl

# Fix: ensure navigation, breadcrumbs, and in-content links
# are present in the server-side rendered HTML

# Next.js: links rendered with <Link> component are SSR-ready
# by default if using App Router or getStaticProps

# React CSR: links inside components that render client-side
# will NOT appear in initial HTML — high risk for crawl discovery

text

9.3 Structured data in rendered output

Structured data in rendered output

# Structured data must be in server-rendered HTML
# Not injected by client-side JavaScript after page load

# Check: is JSON-LD in raw HTML or JS-rendered?
curl -sA 'Googlebot/2.1' https://yoursite.com/blog/post-slug/ | grep 'ld+json'

# If this returns the JSON-LD block: schema is server-rendered (good)
# If this returns nothing: schema is JS-injected (may not be seen on first crawl)

# Common mistake in React/Next.js:
// BAD: JSON-LD injected via useEffect (client-side only)
useEffect(() => {
  const script = document.createElement('script');
  script.type = 'application/ld+json';
  script.text = JSON.stringify(schemaData);
  document.head.appendChild(script);
}, []);

// GOOD: JSON-LD in <Head> (server-rendered)
import Head from 'next/head';
// In your page component:
<Head>
  <script type='application/ld+json'>
    {JSON.stringify(schemaData)}
  </script>
</Head>

javascript

Step 10: Search Console Signals and Quick Wins

Google Search Console contains data that no external tool can replicate: the actual queries driving impressions, the CTR by position, and Google’s own assessment of indexation and performance. The audit mines this data for immediate wins.

10.1 The CTR opportunity report

Pages that rank in positions 4–10 but have a below-average CTR are the fastest SEO wins available. The ranking exists. The traffic gap is a title tag and meta description problem. Fix the copy, and the traffic follows within days.

The CTR opportunity report

# Google Search Console: Performance > Search Results
# Enable: Impressions, Clicks, CTR, Position
# Filter: Position > 3 (exclude top 3 where CTR is expected to be high)
# Sort: CTR ascending

# Expected CTR benchmarks by position (approximate, varies by query type):
# Position 1:  ~28-39%
# Position 2:  ~15-20%
# Position 3:  ~10-13%
# Position 4:  ~7-9%
# Position 5:  ~5-7%
# Position 6-10: 3-5%

# Flag: any page at position 4-6 with CTR under 3%
# These pages are ranking but not converting impressions to clicks
# Action: rewrite title tag and meta description
# New title should: answer the query intent, include a benefit, create curiosity

# Bulk CTR export from GSC via API:
# Use Google Search Console API or connect to Looker Studio
# Filter: impressions > 500 AND position < 20 AND CTR < 3%
# Export for prioritised rewriting list

text

10.2 Search appearance and rich results

Search appearance and rich results

# GSC: Search Results > Search Type > drop 'Appearance' filter
# Check which pages are appearing in rich results (FAQ, Article, etc.)
# Compare to pages that SHOULD have rich results based on schema audit

# If a page has valid FAQPage schema but no FAQ rich result in GSC:
# - Check: did the rich result appear and then disappear (spam signal)?
# - Check: does the FAQ content on the page match what’s in schema?
# - Check: are there fewer than 2 or more than 10 Q&A pairs?

# GSC: Experience > Core Web Vitals
# Review: URLs by status (Good/Needs Improvement/Poor)
# Poor URLs: active ranking penalty, fix immediately
# Filter by device: mobile issues are more critical (mobile-first indexing)

text

10.3 Index coverage audit by page type

Index coverage audit by page type

# Advanced: compare expected vs actual index coverage

# From Screaming Frog: export all 200-status URLs with type annotation
# From GSC Sitemap report: total submitted vs total indexed

# If submitted = 60 URLs but indexed = 38:
# 22 URLs are not indexed — where are they?

# GSC: Indexing > Pages > Not Indexed > 'Crawled, currently not indexed'
# These pages passed crawl but Google chose not to index them
# Common causes:
# - Thin content (under ~300 words with no unique value)
# - Near-duplicate of another page (similar content, different URL)
# - Slow page speed triggering quality downgrade
# - Soft 404 (page returns 200 but content implies page is empty)

# Check for soft 404 pattern:
curl -s https://yoursite.com/nonexistent-page/ | grep -c '<p>'
# If this returns a high count despite non-existent page:
# The site is returning 200 for missing pages with templated content
# Crucial Rendering Rule: Following Google's December 2025 rendering pipeline update, pages returning non-200 codes (like 404s) are frequently skipped by the JavaScript rendering queue. Ensure all error handling and 404 architectures are executed strictly server-side; any helpful "You might also like" navigation injected via client-side JS on a 404 page will be entirely invisible to Googlebot.

text

Step 11: Advanced Technical Checks

The following checks apply to specific site configurations: multilingual sites, large e-commerce catalogues, and sites using headless or API-driven architectures. Apply those relevant to the site being audited.

11.1 hreflang for multilingual/multiregional sites

hreflang for multilingual/multiregional sites

<!-- hreflang implementation: tells Google which page to serve in which language/region -->
<link rel="alternate" hreflang="en-ng" href="https://yoursite.com/en/" />
<link rel="alternate" hreflang="en-gb" href="https://yoursite.com/uk/" />
<link rel="alternate" hreflang="x-default" href="https://yoursite.com/" />

<!-- Rules for correct hreflang: -->
<!-- 1. Every page in the set must reference ALL other pages in the set -->
<!-- 2. hreflang values must be valid BCP 47 language-region codes -->
<!-- 3. Every page must include a self-referencing hreflang entry -->
<!-- 4. x-default is required for the catch-all/homepage -->

# Common hreflang errors:
# - Non-canonicalized pages in hreflang set
# - hreflang annotations not reciprocated (A points to B but B doesn't point to A)
# - Wrong BCP 47 codes ('en_US' instead of 'en-us', or 'uk' instead of 'en-gb')

# Validate hreflang with Screaming Frog:
# Reports > hreflang > hreflang Languages
# Shows all hreflang sets and flags incomplete reciprocation

text

11.2 Pagination and infinite scroll

Pagination and infinite scroll

# Pagination best practices (Google deprecated rel=prev/next in 2019):
# Google recommends: strong internal links between paginated pages
# Each paginated page should have unique content value

# For paginated pages that should NOT be indexed:
# Option 1: noindex on /page/2/ onwards (simplest)
# Option 2: canonical pointing /page/2/ to the root page (consolidates signals)

# For infinite scroll: Google cannot scroll to load content
# Ensure infinite scroll content has a paginated fallback:
# /products/?page=1, /products/?page=2 etc.
# These paginated URLs should be crawlable even if UX uses infinite scroll

# Check if paginated URLs are being indexed (they usually shouldn't be):
# GSC: Performance > search type > filter by '/page/' URL pattern
# If paginated archive URLs are ranking: add noindex and update internal links

text

11.3 Duplicate content at scale

Duplicate content at scale

# Large sites commonly generate duplicate content through:
# - Faceted navigation (filters creating parameter URLs)
# - Sorted views (?sort=price-asc, ?sort=latest)
# - Printer-friendly versions (/print/article-slug/)
# - Mobile subdomain duplicating www content (m.yoursite.com)
# - Tag and category archive pages with same posts

# Find parameter-based duplicates:
# Screaming Frog: URL > filter for '?' in URL column
# Count parameter URLs vs canonical URLs

# Python: find near-duplicate content pages
import difflib

def check_similarity(content1, content2, threshold=0.85):
    ratio = difflib.SequenceMatcher(None, content1, content2).ratio()
    return ratio > threshold

# Use with Screaming Frog’s custom extraction to compare body text
# across pages in the same category or using similar templates

# Fix options for parameter-based duplicates:
# 1. Canonical tag pointing parameter URL to clean URL
# <link rel="canonical" href="/products/shoes/" />
# 2. noindex + follow on parameter URLs
# 3. Parameter handling in Google Search Console (legacy; less reliable)
# 4. Disallow parameter URLs in robots.txt (stops crawl but allows index if linked)

python

Step 12: Writing the Audit Report

An audit is only as useful as the decisions it produces. The findings from the previous eleven steps need to be translated into a prioritised action list that a development team can execute against.

12.1 Prioritisation framework

Not every audit finding is equally urgent. Prioritise issues using two axes: severity (how much is this costing in rankings or traffic right now?) and effort (how long will this take to fix?).

Priority matrix:

# Priority matrix:

# P1: High severity, low effort — fix this week
#   Examples: incorrect robots.txt, missing sitemap, canonical pointing to 404,
#             entire site noindexed, HTTPS not enforced, CWV in 'Poor' range

# P2: High severity, medium effort — fix within 30 days
#   Examples: JS rendering blocking content, redirect chains on key pages,
#             structured data errors on all pages, orphaned service pages,
#             duplicate title tags across high-value pages

# P3: Medium severity, high effort — schedule for next sprint
#   Examples: image alt text gaps, URL structure cleanup, schema
#             implementation on new page types, hreflang errors

# P4: Low severity — address opportunistically
#   Examples: meta description improvements, H2 hierarchy fixes,
#             minor page speed improvements, sitemap refresh

text

12.2 Report structure

A professional audit report contains five sections: an executive summary with the three most critical findings, a technical findings log with severity and effort ratings for each issue, a prioritised action plan, a baseline metrics snapshot (GSC impressions, CWV status, indexed URLs) for progress tracking, and an appendix with all raw data exports. The findings log should be specific enough for a developer to execute without needing a follow-up call.

Audit report — suggested file structure:
/audit-report/
01-executive-summary.docx
02-technical-findings.xlsx
03-prioritised-actions.csv
04-baseline-metrics.pdf (GSC screenshot + CWV report)
05-appendix/
#screaming-frog-export.csv
#pagespeed-results.json
#redirect-chains.csv
#structured-data-validation.txt
#Each finding in 02-technical-findings.xlsx should include:
#URL(s) affected | Issue description | Evidence | Priority | Fix action | Owner | Done
# Executive summary format:
# - Sites audited: [URL], [date]
# - Critical findings: [count] P1, [count] P2, [count] P3
# - Current index coverage: [X] / [Y] pages indexed
# - Core Web Vitals status: [X]% URLs Good (mobile)
# - Most urgent action: [one sentence]

The Complete Technical SEO Audit Checklist

The following checklist consolidates every check from this guide into a single reference. Priority codes: P1 = fix this week, P2 = fix within 30 days, P3 = schedule for next sprint.

Check	Tool	Priority	Status
1. Crawlability and Access
robots.txt is accessible (200 response) and contains no unintended Disallow rules	curl yoursite.com/robots.txt	P1	☐
No critical content paths, image directories, or API routes are blocked	GSC robots.txt tester	P1	☐
Sitemap is accessible, returns 200, and contains no redirect or error URLs	curl + Screaming Frog	P1	☐
All sitemap URLs are canonical, indexable 200-status pages	Screaming Frog	P1	☐
Crawl budget is not wasted on parameter URLs or pagination	GSC > Crawl Stats	P2	☐
Key pages return 200; no authentication or server errors blocking crawl	curl -I	P1	☐
No cloaking: same HTML served to Googlebot and regular users	curl -A Googlebot	P1	☐
2. Indexation
GSC Coverage report reviewed; all errors triaged	GSC > Indexing > Pages	P1	☐
No important pages carry a noindex tag	Screaming Frog	P1	☐
All pages have a self-referencing canonical tag	Screaming Frog	P1	☐
Canonical tags point to live 200-status URLs (not 301 or 404)	Screaming Frog	P1	☐
No duplicate or conflicting canonical signals (HTTP vs HTTPS, www vs non-www)	Screaming Frog	P2	☐
'Crawled, currently not indexed' pages investigated for quality/thin content issues	GSC	P2	☐
3. Site Architecture and URLs
All key pages reachable within 3 clicks from homepage	Screaming Frog > Crawl Depth	P2	☐
Consistent trailing slash convention enforced with 301 on non-preferred variant	curl -I	P1	☐
www and non-www redirect to single canonical root	curl -I	P1	☐
HTTP redirects to HTTPS on all variants	curl -I	P1	☐
No redirect chains (A -> B -> C); all chains collapsed to direct 301s	Screaming Frog	P2	☐
No redirect loops	Screaming Frog	P1	☐
No broken internal links (links pointing to 4xx or 5xx)	Screaming Frog	P2	☐
No orphaned pages (key pages with 0 inbound internal links)	Screaming Frog	P2	☐
Internal link anchor text is descriptive, not generic	Manual + Screaming Frog	P3	☐
4. Performance and Core Web Vitals
LCP field data in 'Good' range (<2.5s) on mobile	GSC > CWV + CrUX API	P1	☐
CLS field data in 'Good' range (<0.1)	GSC > CWV + DevTools	P1	☐
INP field data in 'Good' range (<200ms)	GSC > CWV + DevTools	P1	☐
TTFB under 800ms on key pages	curl timing	P2	☐
LCP element identified and served with fetchpriority='high'	DevTools Performance	P2	☐
All images have explicit width and height attributes (CLS prevention)	Screaming Frog	P2	☐
No render-blocking resources on critical path	PageSpeed Insights	P2	☐
No main-thread animations using layout-triggering CSS properties	DevTools Performance	P3	☐
Third-party scripts loaded with defer or after interaction	DevTools > Network	P2	☐
5. Mobile and Usability
viewport meta tag present on all pages	Screaming Frog	P1	☐
No mobile usability errors in GSC	GSC > Experience > Mobile Usability	P1	☐
Tap targets at least 44x44px on mobile	Lighthouse > Accessibility	P2	☐
No content wider than screen (no horizontal scroll)	DevTools device emulation	P2	☐
Content parity between mobile and desktop renders	Screaming Frog (two crawls)	P2	☐
Structured data present in mobile rendered HTML	GSC URL Inspection	P2	☐
6. On-Page Technical Signals
Every page has exactly one unique H1 tag	Screaming Frog	P1	☐
Every key page has a unique, keyword-informed title tag under 60 chars	Screaming Frog	P1	☐
No duplicate title tags across key pages	Screaming Frog	P1	☐
Every page has a unique meta description under 155 chars	Screaming Frog	P2	☐
All images have descriptive alt text (decorative images have alt='')	Screaming Frog	P2	☐
No images over 150KB on content pages	Screaming Frog	P2	☐
All images in WebP or AVIF format (not JPEG/PNG for photos)	Screaming Frog	P3	☐
7. Structured Data
Organisation or LocalBusiness schema on homepage	Rich Results Test	P1	☐
Article or BlogPosting schema on all blog posts	Screaming Frog extraction	P2	☐
BreadcrumbList schema on all interior pages	Rich Results Test	P2	☐
FAQPage schema on applicable pages with Q&A content	Rich Results Test	P2	☐
All schema blocks validated (no syntax errors, required properties present)	Rich Results Test	P1	☐
Schema present in server-rendered HTML (not JS-injected)	curl \| grep ld+json	P1	☐
dateModified property up to date on refreshed articles	Manual review	P3	☐
8. HTTPS and Security
SSL certificate valid and not expiring within 30 days	openssl s_client	P1	☐
All four root variants (http/https x www/non-www) redirect to canonical	curl -I	P1	☐
No mixed content warnings (HTTP resources on HTTPS pages)	Chrome DevTools Console	P1	☐
HTTP/2 enabled (not HTTP/1.1)	curl --http2	P2	☐
GZIP or Brotli compression enabled	curl --compressed	P2	☐
HSTS header present	curl -I \| grep Strict	P3	☐
9. JavaScript Rendering
Key page content present in raw HTML (not JS-rendered only)	curl -A Googlebot + wc	P1	☐
Navigation links present in raw HTML	curl \| grep '<a href'	P1	☐
Structured data in server-rendered HTML, not JS-injected	curl \| grep ld+json	P1	☐
GSC URL Inspection rendered HTML matches browser view	GSC URL Inspection	P2	☐
No client-side redirects (window.location) on indexed pages	Manual review	P2	☐
10. Search Console Signals
CTR opportunity audit: positions 4-10 with CTR below benchmark	GSC Performance	P2	☐
Rich result eligibility reviewed against schema coverage	GSC > Search Appearance	P2	☐
No manual penalties in GSC Security & Manual Actions	GSC > Security	P1	☐
Core Web Vitals report reviewed by URL group	GSC > Experience	P1	☐
Top 20 keywords reviewed for intent alignment with landing pages	GSC Performance	P2	☐

Running the Audit Efficiently

A full technical SEO audit of a small to medium site (50–300 pages) takes three to five hours with the tools described in this guide. Larger sites with complex architectures take proportionally longer. The time investment concentrates in three areas: running and interpreting the Screaming Frog crawl, mining GSC for coverage and performance signals, and validating JavaScript rendering on key pages.

The order of the steps in this guide reflects the order a practitioner should work through them. Crawlability and indexation issues at Steps 1 and 2 are the highest-leverage findings: a site with crawl or indexation problems is one where all subsequent optimisation effort is partially wasted. Fix the access issues first. Then fix the architecture. Then fix the page-level signals. Performance and structured data compound on top of a clean foundation.

The checklist at the end of this guide is designed to be used on every audit. Work through every item. Flag, don’t skip. An item that passes takes thirty seconds to confirm. An item that reveals a critical issue saves months of unexplained ranking plateau.

Need a technical SEO audit done for you? hello@semoladigita.com

Semola Digital conducts full technical SEO audits as a standalone engagement and as the Month 1 deliverable in all retainer engagements. The audit deliverable includes: full Screaming Frog export, prioritised findings report, GSC baseline snapshot, and a 60-minute walkthrough call.

Frequently Asked Questions

Questions readers ask about this topic

The FAQs below are pulled directly from this article's structured content and are designed to help readers quickly find answers to common questions related to the topic.

What is a technical SEO audit?

A technical SEO audit is a systematic examination of your website's infrastructure to identify issues that prevent search engines and AI systems from crawling, indexing, and understanding your content correctly. It covers six primary areas: crawlability (can search engines access your pages?), indexation (are the right pages in Google's index?), site speed and Core Web Vitals, structured data and schema markup, internal linking architecture, and mobile usability. Unlike content or backlink audits, a technical audit focuses on the underlying systems that determine whether your content is visible to search engines at all.

How often should you do a technical SEO audit?

Conduct a full technical SEO audit quarterly — every 90 days — and a lightweight monthly monitoring check between full audits. Quarterly full audits catch regressions introduced by CMS updates, plugin changes, or new content deployments before they compound into ranking losses. Monthly monitoring should cover: Search Console coverage errors, Core Web Vitals trend, crawl stats, and any new manual action warnings. Additionally, run an audit immediately after any significant site change — a redesign, a platform migration, or a major plugin update — regardless of the scheduled cadence.

What are Core Web Vitals and why do they matter for SEO?

Core Web Vitals are three specific page experience metrics Google uses as ranking signals. As of 2026: LCP (Largest Contentful Paint) measures how quickly the main visible element loads — target under 2.5 seconds on mobile. INP (Interaction to Next Paint) measures how quickly the page responds after a user interaction — target under 200 milliseconds. CLS (Cumulative Layout Shift) measures visual stability as the page loads — target under 0.1. Pages that pass all three thresholds receive a 'Good' Core Web Vitals status in Search Console, which Google uses as a tiebreaker when ranking pages with otherwise similar quality signals. Pages with 'Poor' CWV scores are at a measurable competitive disadvantage.

Should I block AI bots in my robots.txt during a technical audit?

Distinguish between AI training bots and AI search retrieval bots — they require different robots.txt treatment. AI training bots (GPTBot, Google-Extended, CCBot) crawl your site to train AI models on your content — you may legitimately choose to block these using Disallow rules in robots.txt. AI search retrieval bots (PerplexityBot, ChatGPT-User, ClaudeBot) crawl your site to retrieve information for AI-generated answers — blocking these removes you from AI citation consideration entirely. As of 2026, 71% of major publishers accidentally block retrieval bots. Check your robots.txt: blocking PerplexityBot, ChatGPT-User, or ClaudeBot will make your content invisible to those AI citation systems.

What is the difference between crawled and indexed in Google Search Console?

'Crawled' means Googlebot visited the page and downloaded its content. 'Indexed' means Google decided the page was valuable enough to include in its search index and make eligible for ranking. A page can be crawled without being indexed — this happens when Google evaluates the page as thin, duplicate, or low-quality after crawling. In Search Console's Pages report, 'Crawled — currently not indexed' is a specific status indicating Google read the page but chose not to include it, typically due to thin content, duplicate content, or a low-quality signal in the broader site evaluation.

How do I check if my website is being crawled by Google?

Three methods, in order of reliability. First: Google Search Console → Settings → Crawl Stats — shows Googlebot's crawl rate, total requests, and download sizes over the last 90 days. A flat or declining crawl rate after publishing new content signals a crawl budget problem. Second: Search Console → Pages report — 'Crawled, not indexed' and 'Not crawled' entries reveal which pages Google has and has not visited. Third: Screaming Frog → Configuration → Log File Analysis — upload your server log file to see exactly which URLs Googlebot requested and when. Log file analysis is the most precise method but requires server access.

What is the December 2025 rendering update and how does it affect technical audits?

In December 2025, Google clarified that pages returning non-200 HTTP status codes may be excluded from its rendering pipeline entirely. Previously, Google would sometimes render JavaScript pages even if they returned unusual status codes. Now, a page must return a clean 200 OK status code for Google to enter the rendering queue and execute its JavaScript. For Single Page Applications (SPAs), this means if your SPA shell returns 200 OK but loads a '404 not found' component via JavaScript, Google may index the error state rather than the actual content. In your technical audit, verify every page returns the correct HTTP status code before checking anything else.

How do I improve my website's LCP score?

LCP (Largest Contentful Paint) is almost always caused by a slow hero image or H1 heading. Improve it with these five interventions, in order of impact: Convert your hero/banner image to WebP format and compress it to under 80KB at its exact display dimensions. Add the fetchpriority='high' attribute to the LCP image element in your HTML — this tells the browser to prioritise loading it above other resources. Implement a CDN (Cloudflare's free tier) to serve your site from a data centre closer to your users — critical for African and Asian market audiences accessing European or US-hosted servers. Defer all JavaScript and CSS not required for above-fold rendering using your caching plugin (WP Rocket, LiteSpeed Cache). Check your server TTFB (Time to First Byte) in PageSpeed Insights — TTFB above 600ms requires hosting upgrade or CDN implementation before image optimisation will have full impact.

Share this article

in 𝕏

Oladoyin Falana

Founder, Technical Analyst

Oladoyin Falana is a certified digital growth strategist and full-stack web professional with over five years of hands-on experience at the intersection of SEO, web design & development. His journey into the digital world began as a content writer — a foundation that gave him a deep, instinctive understanding of how keywords, content and intent drive organic visibility. While honing his craft in content, he simultaneously taught himself the building blocks of the modern web: HTML, CSS, and React.js — a pursuit that would eventually evolve into full-stack Web Development and a Technical SEO Analyst.

Follow me on LinkedIn →

Related Insights

Technical SEO

LLMs.txt for Nigerian Businesses: What It Is, What It Actually Does, and Whether You Should Implement It

Read Article

Technical SEO

Agentic SEO: What it is, What it Can't Do, and Whether Nigerian Businesses Should Care

Read Article

Technical SEO

robots.txt for AI Crawlers in 2026: Training Bots vs Retrieval Bots — The Distinction That Decides Your AI Visibility

Read Article

Technical SEO

Fixing GA4 Data Bloat: How to Strip WooCommerce Filter Parameters (Without Killing Your UTMs)

Read Article

Technical SEO

How to Fix WooCommerce Core Web Vitals Without a Developer — A Complete Plugin-Based Repair Guide

Read Article

Table of Contents

What a Technical SEO Audit Actually Is

Step 1: Crawlability and Access

1.1 robots.txt

1.2 XML Sitemap

1.3 Crawl budget

1.4 Site access check

Step 2: Indexation

2.1 Coverage report analysis

2.2 noindex audit

2.3 Canonical tags

Step 3: Site Architecture and URL Structure

3.1 Click depth

3.2 URL structure audit

3.3 Redirect audit

3.4 Internal link audit

Step 4: Page Speed and Core Web Vitals

4.1 Field data vs lab data

4.2 LCP diagnosis

4.3 CLS diagnosis

4.4 INP diagnosis

Step 5: Mobile and Usability

5.1 Mobile usability check

5.2 Mobile vs desktop parity

Step 6: On-Page Technical Signals

6.1 Title tags

6.2 Meta descriptions

6.3 Header tag structure

6.4 Image optimisation

Step 7: Structured Data

7.1 Schema coverage audit

7.2 Schema validation

Step 8: HTTPS, Security, and Server Configuration

8.1 HTTPS implementation

8.2 Server response headers

Step 9: JavaScript Rendering

9.1 Rendered vs unrendered HTML check

9.2 Internal links in JavaScript

9.3 Structured data in rendered output

Step 10: Search Console Signals and Quick Wins

10.1 The CTR opportunity report

10.2 Search appearance and rich results

10.3 Index coverage audit by page type

Step 11: Advanced Technical Checks

11.1 hreflang for multilingual/multiregional sites

11.2 Pagination and infinite scroll

11.3 Duplicate content at scale

Step 12: Writing the Audit Report

12.1 Prioritisation framework

12.2 Report structure

The Complete Technical SEO Audit Checklist

Running the Audit Efficiently

Questions readers ask about this topic

Share this article

Related Insights

LLMs.txt for Nigerian Businesses: What It Is, What It Actually Does, and Whether You Should Implement It

Agentic SEO: What it is, What it Can't Do, and Whether Nigerian Businesses Should Care

robots.txt for AI Crawlers in 2026: Training Bots vs Retrieval Bots — The Distinction That Decides Your AI Visibility

Fixing GA4 Data Bloat: How to Strip WooCommerce Filter Parameters (Without Killing Your UTMs)

How to Fix WooCommerce Core Web Vitals Without a Developer — A Complete Plugin-Based Repair Guide