SiteScan Analyzer Documentation

Complete reference for all 27 analyzers and 160 checks that power SiteScan SEO analysis.

27
Analyzers
160
Checks
5
Departments

Web Development

Server configuration, HTTP protocols, encoding, responsive design

001

HTTP Error Status Checker

4xx/5xx

Identifies pages returning 4xx (client errors) and 5xx (server errors) HTTP status codes, and maps which internal pages link to them.

Code Severity Description
E01 ERROR HTTP 5xx server error pages
W01 WARNING HTTP 4xx client error pages (404, 403, etc.)
002

Soft 404 Detection

Soft404

Detects pages that return HTTP 200 but behave like error pages (soft 404s) using multi-signal scoring: reference matching, phrase detection, and statistical anomaly.

Code Severity Description
W01 WARNING Page detected as soft 404 (score above threshold)
003

Redirect Chain Analyzer

Redir

Analyzes HTTP redirect chains, detecting excessive hops, external redirects, and structural link rot from internal pages pointing to redirected URLs.

Code Severity Description
E01 ERROR Redirect chain exceeds 5 hops (Google recommended max)
W01 WARNING Redirect detected (1-5 hops)
W02 WARNING Problematic redirect: internal referrers point to different final URL
W03 WARNING External redirect: final URL on different domain
W04 WARNING Host canonicalization missing: www and non-www both serve content on different hosts (no 301 between them)
009

Character Encoding Checker

Charset

Comprehensive character encoding analysis: missing charset declarations, mismatches between HTTP and HTML charset, mojibake patterns, BOM presence, and double-encoding.

Code Severity Description
E01 ERROR No charset declaration found (HTTP or HTML)
E02 ERROR HTTP Content-Type charset differs from HTML meta charset
E03 ERROR U+FFFD replacement characters detected in content
E04 ERROR Mojibake patterns detected (UTF-8 misread as Latin-1/CP1252)
E05 ERROR Encoding corruption in SEO fields (title, description, H1)
E06 ERROR Invalid or unrecognized charset name
W01 WARNING Non-UTF-8 charset declared
W02 WARNING UTF-8 BOM present
W03 WARNING Double-encoded UTF-8 detected
W04 WARNING NBSP mojibake pattern detected
W05 WARNING Charset declared only in HTML meta, not HTTP header
012

Robots.txt Analyzer

Robots

Analyzes robots.txt format, directives, and effectiveness. Checks for missing or malformed files, overly restrictive rules, sitemap presence, and crawl budget impact.

Code Severity Description
E01 ERROR Invalid format: RTF, HTML, PDF, XML, or binary content
E02 ERROR UTF-8 BOM present in robots.txt
E03 ERROR Wrong Content-Type (not text/plain)
E04 ERROR HTTP error response (4xx/5xx)
E05 ERROR Blocks all crawlers (Disallow: / for *)
E06 ERROR No sitemap found anywhere
W01 WARNING Missing robots.txt file
W02 WARNING Empty robots.txt file
W03 WARNING File too large (>500 KiB, Google truncates)
W04 WARNING Blocks major crawlers (Googlebot, Bingbot, GPTBot, ClaudeBot)
W05 WARNING Sitemap exists but not declared in robots.txt
W06 WARNING Restrictive crawl budget (<0.5 pages/day)
W07 WARNING Blocks static resources (/images/, /css/, /js/)
W08 WARNING Orphaned rules (before any User-agent)
W09 WARNING Deprecated noindex directive
W10 WARNING Sitemap URL is relative, not absolute
013

Responsive/Viewport Checker

Mobile

Checks viewport meta tag presence and configuration for mobile-friendly design. Detects missing viewports, disabled zoom, fixed-width layouts, and inconsistent viewport coverage.

Code Severity Description
E01 ERROR Missing viewport meta tag
W01 WARNING Viewport issues (user-scalable=no, fixed width, missing initial-scale)
W02 WARNING Partial viewport coverage: inconsistent across pages
028

Meta Directives Analyzer

MetaDir

Detects meta robots directives (noindex, nofollow) and meta refresh redirects across all pages. Flags critical issues like noindex on homepage, sitemap conflicts, and HTML-level redirects.

Code Severity Description
E01 ERROR Homepage has noindex (entire site may be deindexed)
E02 ERROR Page in sitemap.xml has noindex (conflicting signals)
E03 ERROR Homepage uses meta refresh redirect (should use HTTP 301)
E04 ERROR Page has meta refresh with delay > 0 (accessibility + SEO issue)
E05 ERROR URL disallowed in robots.txt but has meta noindex — conflicting signals, Google cannot see noindex and may index URL without snippet
W01 WARNING Non-homepage page has noindex directive
W02 WARNING Page has nofollow-only directive
W05 WARNING Page has instant meta refresh (delay=0) — should use HTTP 301
W06 WARNING Meta refresh redirects to external domain
W07 WARNING Meta refresh target differs from canonical URL
030

HTTP Response Headers Analyzer

Hdrs

Per-URL analysis of HTTP response headers captured by the crawler: cookie security flags, server fingerprint, X-Frame-Options / X-Content-Type-Options / HSTS coverage across all pages, ETag inode leak, TTFB statistics. Complements 014 (homepage-only score) with site-wide coverage.

Code Severity Description
E01 ERROR Cookie set on HTTPS page without Secure flag (MITM cookie theft)
W01 WARNING Cookie without HttpOnly flag (XSS cookie theft)
W02 WARNING Cookie without SameSite attribute (CSRF risk)
W03 WARNING Server banner exposes version (fingerprint)
W04 WARNING X-Powered-By header leaks backend stack
W05 WARNING Missing X-Frame-Options header (per-URL)
W06 WARNING Missing X-Content-Type-Options: nosniff (per-URL)
W07 WARNING Missing Strict-Transport-Security on HTTPS URL (per-URL)
W08 WARNING ETag exposes Apache inode number (fingerprint)

SEO & Visibility

Search engine optimization, structured data, metadata, linking

004

Heading Structure Checker

Hdg

Validates H1-H6 heading hierarchy, checking for missing H1, multiple H1 tags, and heading level skips that harm SEO and accessibility.

Code Severity Description
E01 ERROR Missing H1: page has zero H1 headings
W01 WARNING Multiple H1: page has more than one H1 heading
W02 WARNING Hierarchy skip: heading level skipped (e.g., H1 to H3)
005

Duplicate H1/Title/Description Checker

Dupl

Finds pages sharing identical H1 headings, title tags or meta descriptions, which dilutes SEO value and confuses search engines.

Code Severity Description
W01 WARNING Duplicate H1: same H1 text appears on multiple pages
W02 WARNING Duplicate title: same title tag appears on multiple pages
W03 WARNING Duplicate meta description: same description text appears on multiple pages
008

Heading Length Checker

HdgLen

Checks heading text length for SEO optimization. Headings that are too short provide little SEO value, while overly long headings may be truncated.

Code Severity Description
W01 WARNING Heading too short: less than 10 characters
W02 WARNING Heading too long: more than 70 characters
011

Language, Canonical & Hreflang Checker

Lang

Validates HTML lang attributes, canonical URLs, and hreflang implementations for multilingual SEO. Checks for missing, invalid, or conflicting declarations.

Code Severity Description
E01 ERROR Missing lang attribute in HTML tag
E02 ERROR Missing or invalid canonical URL
E03 ERROR Missing reciprocal hreflang (A links to B but B does not link to A)
E04 ERROR Hreflang href is not absolute URL
E05 ERROR Canonical conflicts with hreflang
W01 WARNING Missing x-default hreflang entry
W02 WARNING Canonical URL issues (relative, non-self, trailing slash mismatch)
W03 WARNING HTTPS page but hreflang/canonical uses HTTP
015

Open Graph & Twitter Cards Checker

OG

Validates Open Graph meta tags and Twitter Cards for social sharing. Checks required tags, content quality, consistency with page metadata, and duplicate detection.

Code Severity Description
E01 ERROR No Open Graph tags found on page
E02 ERROR Missing required OG tags (og:title, og:description)
E03 ERROR OG image or URL is relative (not absolute)
W01 WARNING Missing og:image tag
W02 WARNING OG tag quality issues (too long, too short, HTTP image)
W03 WARNING Missing or invalid Twitter Card
W04 WARNING Duplicate OG tags across pages
W05 WARNING OG metadata inconsistent with page title/description/canonical
018

Image ALT Quality

ALT

Evaluates image ALT text coverage and quality. Detects missing ALT attributes, empty ALT text, untranslated ALT for multilingual sites, and overly long descriptions.

Code Severity Description
E01 ERROR Image with empty or missing ALT attribute
W01 WARNING Image ALT quality issue (untranslated, too long, very short)
021

Structured Data Validator

JSON-LD

Validates JSON-LD structured data blocks against Schema.org specifications. Checks syntax, required fields per type, URL format, and type recognition.

Code Severity Description
E01 ERROR JSON-LD syntax error (invalid JSON)
E02 ERROR Missing @type property
E03 ERROR Required fields missing for Schema.org type
W01 WARNING URL fields are not absolute URLs
W02 WARNING Recommended fields missing
W03 WARNING Unknown or unrecognized Schema.org @type
023

Internal Link Analysis

IntLnk

Analyzes internal linking structure using BFS from homepage. Detects orphan pages, deep pages (>4 clicks), generic anchor text, and self-linking patterns.

Code Severity Description
E01 ERROR Sitemap-orphan page: URL declared in sitemap.xml but no crawled page links to it (client forgot navigation path)
W01 WARNING Deep page: reachable only with more than 4 clicks from homepage
W02 WARNING Generic anchor text (click here, read more, etc.)
W03 WARNING Self-linking: page links to itself
W04 WARNING Dead-end page: no outgoing internal links (blocks PageRank flow, excludes whitelisted terminals like contact/privacy)
031

Sitemap Diff Analyzer

SitemapDiff

Compares the just-generated sitemap.xml with the previous scan's snapshot. Detects URL count spikes/drops (vs 7-day baseline) and suspicious new URLs (webshell paths, spam keywords, exotic charsets). Replaces the daily sitemap watchdog: the customer's sitemap.xml is SiteScan output, so meaningful diffs only happen between consecutive scans, not day-by-day.

Code Severity Description
E01 ERROR Suspicious URLs added (an_sm_03): webshell paths /wp-content/uploads/*.php, double-extension, spam keywords pharma/casino/replica, cyrillic/CJK/arabic chars
W01 WARNING URL count spike (an_sm_02): +30% AND +50 absolute URLs vs previous snapshot
W02 WARNING URL count drop (an_sm_04): -30% AND -50 absolute URLs vs previous snapshot (de-indexing or content removal)
032

Structured Data Coherence

SD-Coherence

Verifies that the homepage's JSON-LD structured data actually describes the real business, by comparing it against the AI-derived site profile (context_llm.json). Parses the whole document (head+body), so it catches body-injected JSON-LD that 021 misses. Two layers: deterministic NAP match (name/phone/email/locality) + neutral AI semantic judgement (Gemini) that also flags moved/empty/parked sites, language drift, or a different business — without presuming the cause. Homepage-only.

Code Severity Description
E01 ERROR Structured data (JSON-LD) describe a different entity than the business — 2+ NAP contradictions (e.g. agency placeholder name/address)
E02 ERROR Homepage incoherent with the expected business per AI: site moved/parked/empty, different activity, or different main language
W01 WARNING Single NAP discrepancy between JSON-LD and the business profile
W02 WARNING Minor semantic discrepancy between live homepage and expected profile (AI)

Security

HTTP security headers, JS vulnerabilities, mixed content

010

External Links Checker

ExtLnk

Validates all external links: detects broken links (4xx/5xx), connection failures, slow responses, HTTP-only links, and suspicious URL shorteners.

Code Severity Description
E01 ERROR Critical error: 403, 404, 500, 502, 503, 504 or connection failure
W01 WARNING Manual verification needed: WAF-protected domain
W02 WARNING Warning: 401, 408, 429 status, slow response, HTTP not HTTPS, or suspicious shortener
014

HTTP Security Headers

Security

Evaluates 10 HTTP security headers using Mozilla Observatory methodology. Grades from A+ to F based on CSP, HSTS, cookie flags, CORS, X-Frame-Options, and more.

Code Severity Description
E01 ERROR Critical security header missing or misconfigured (score impact >= -20)
W01 WARNING Security header improvement recommended (score impact -1 to -19)
016

JavaScript CVE Vulnerability Scanner

JSVuln

Scans JavaScript libraries for known CVE vulnerabilities using Retire.js database. Reports severity (Critical/High/Medium/Low), affected versions, and available fixes.

Code Severity Description
E01 ERROR Critical or High CVE vulnerability (CVSS >= 7.0)
W01 WARNING Medium or Low CVE vulnerability (CVSS < 7.0)
025

HTTPS Mixed Content Detector

HTTPS

Detects mixed content on HTTPS sites: HTTP scripts and stylesheets (blocked by browsers), HTTP images in OG tags, HTTP canonical/internal links, protocol-relative resource URLs, and unsafe cross-origin links.

Code Severity Description
E01 ERROR Script loaded via HTTP (active mixed content, blocked by browsers)
E02 ERROR Stylesheet loaded via HTTP (active mixed content, blocked by browsers)
W01 WARNING og:image uses HTTP on HTTPS site
W02 WARNING Canonical URL uses http:// on HTTPS site
W03 WARNING og:url uses http:// on HTTPS site
W04 WARNING Internal links with explicit http:// protocol
W05 WARNING Resource loaded via protocol-relative URL (//...) instead of explicit https://
W06 WARNING Cross-origin link with target="_blank" missing rel="noopener" (tab-nabbing risk)
W07 WARNING Render-blocking scripts: page has threshold+ scripts without async/defer
W08 WARNING External CDN script missing SRI integrity hash (MITM tampering surface)
W09 WARNING Script type=module without crossorigin attribute (CORS/typed errors)
029

Server Security Scanner

SecTest Authorization required

Active pentest-style server security testing: TRACE/XST, host header injection, dangerous HTTP methods, sensitive file exposure, directory listing, TLS weakness, CRLF injection, verbose errors, path traversal, backup file discovery, security.txt compliance.

Code Severity Description
T01 WARNING TRACE method enabled (Cross-Site Tracing risk)
T02 WARNING Host header injection accepted
T03 ERROR Dangerous HTTP methods enabled (PUT, DELETE, MKCOL, PROPFIND)
T04 ERROR Sensitive file accessible (.env, .git/HEAD, .htpasswd, phpinfo.php)
T05 WARNING Directory listing enabled
T06 WARNING Weak TLS version supported (TLS 1.0/1.1)
T07 ERROR CRLF injection possible
T08 WARNING Verbose error pages expose server information
T09 ERROR Path traversal possible
T10 WARNING X-Forwarded-Host injection accepted
T11 WARNING Default virtual host exposure via direct IP access
T12 ERROR Backup file accessible (.bak, .old, .swp)
T13 INFO security.txt (RFC 9116) missing or non-compliant
T14 ERROR CORS reflects arbitrary Origin with credentials (cross-origin data theft)
T15 ERROR CORS allows null Origin (exploitable via sandboxed iframes)
T16 WARNING CORS validates Origin with prefix/substring match instead of exact match
T17 ERROR CONNECT method accepted — server acts as open proxy
T18 ERROR X-Original-URL header overrides request path (access control bypass)
T19 ERROR X-Rewrite-URL header overrides routing (access control bypass)
T20 WARNING HTTP method override headers accepted (method restriction bypass)
T21 ERROR Extended sensitive file accessible (.env.local, phpinfo variants, .npmrc)
T22 ERROR SSL certificate expired, expiring soon, self-signed, or hostname mismatch
T23 ERROR Weak cipher suite accepted (RC4, DES, 3DES, NULL, EXPORT)
T24 WARNING crossdomain.xml allows access from any domain (wildcard)
T25 INFO Dependency files exposed (composer.json, package.json, requirements.txt)
T26 ERROR Admin panel publicly accessible (phpMyAdmin, Adminer, wp-admin)
T27 WARNING Session cookie missing security attributes (HttpOnly, Secure, SameSite)
T28 WARNING Information disclosure via response headers (X-Debug-Token, X-Backend-Server)
T29 INFO Missing modern security headers (Permissions-Policy, COOP, COEP)
T30 WARNING Dynamic page (sets cookies) missing Cache-Control no-store/private

Content & Quality

Grammar, image ALT text, accessibility, content quality

007

Heading Accessibility Checker

A11y

Checks heading accessibility: empty headings, headings hidden via CSS, and image-only headings that screen readers cannot interpret.

Code Severity Description
E01 ERROR Empty heading: text content shorter than minimum length
E02 ERROR Hidden H1: H1 heading with hidden CSS class
W01 WARNING Hidden heading: H2-H6 heading with hidden CSS class
W02 WARNING Image-only heading: both raw_text and text are empty
017

Grammar & Linguistic Quality

Grammar

Analyzes grammar, spelling, and linguistic quality across all site languages using AI (Claude/Gemini). Reports errors, warnings, and suggestions with corrected text.

Code Severity Description
E01 ERROR Grammar or spelling error (misspelling, wrong agreement, wrong word)
W01 WARNING Grammar warning (spacing, capitalization, consistency)
I01 INFO Stylistic suggestion (readability, rephrasing)

Performance

Page load, static resources, browser diagnostics

026

Homepage Browser Scan

HP

Browser-based homepage analysis using headless Chrome. Reports JavaScript console errors, failed network requests, third-party domains, and page load performance metrics.

Code Severity Description
E01 ERROR JavaScript console error or failed network request
W01 WARNING JavaScript console warning
027

Static Resources Checker

Res

Validates static resources (CSS, JS, images, fonts): detects broken resources (4xx/5xx), oversized files (>5MB), redirected resources, and malformed URLs.

Code Severity Description
E01 ERROR Broken resource (4xx/5xx status) or oversized file (>5MB)
W01 WARNING Resource redirect, malformed URL, timeout, or server error

Ready to scan your site?

Sign up for free and run your first scan in under a minute.

Auto-generated on 2026-06-10 from analyzer_registry.json