Complete reference for all 27 analyzers and 160 checks that power SiteScan SEO analysis.
Server configuration, HTTP protocols, encoding, responsive design
Identifies pages returning 4xx (client errors) and 5xx (server errors) HTTP status codes, and maps which internal pages link to them.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | HTTP 5xx server error pages |
W01 |
WARNING | HTTP 4xx client error pages (404, 403, etc.) |
Detects pages that return HTTP 200 but behave like error pages (soft 404s) using multi-signal scoring: reference matching, phrase detection, and statistical anomaly.
| Code | Severity | Description |
|---|---|---|
W01 |
WARNING | Page detected as soft 404 (score above threshold) |
Analyzes HTTP redirect chains, detecting excessive hops, external redirects, and structural link rot from internal pages pointing to redirected URLs.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Redirect chain exceeds 5 hops (Google recommended max) |
W01 |
WARNING | Redirect detected (1-5 hops) |
W02 |
WARNING | Problematic redirect: internal referrers point to different final URL |
W03 |
WARNING | External redirect: final URL on different domain |
W04 |
WARNING | Host canonicalization missing: www and non-www both serve content on different hosts (no 301 between them) |
Comprehensive character encoding analysis: missing charset declarations, mismatches between HTTP and HTML charset, mojibake patterns, BOM presence, and double-encoding.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | No charset declaration found (HTTP or HTML) |
E02 |
ERROR | HTTP Content-Type charset differs from HTML meta charset |
E03 |
ERROR | U+FFFD replacement characters detected in content |
E04 |
ERROR | Mojibake patterns detected (UTF-8 misread as Latin-1/CP1252) |
E05 |
ERROR | Encoding corruption in SEO fields (title, description, H1) |
E06 |
ERROR | Invalid or unrecognized charset name |
W01 |
WARNING | Non-UTF-8 charset declared |
W02 |
WARNING | UTF-8 BOM present |
W03 |
WARNING | Double-encoded UTF-8 detected |
W04 |
WARNING | NBSP mojibake pattern detected |
W05 |
WARNING | Charset declared only in HTML meta, not HTTP header |
Analyzes robots.txt format, directives, and effectiveness. Checks for missing or malformed files, overly restrictive rules, sitemap presence, and crawl budget impact.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Invalid format: RTF, HTML, PDF, XML, or binary content |
E02 |
ERROR | UTF-8 BOM present in robots.txt |
E03 |
ERROR | Wrong Content-Type (not text/plain) |
E04 |
ERROR | HTTP error response (4xx/5xx) |
E05 |
ERROR | Blocks all crawlers (Disallow: / for *) |
E06 |
ERROR | No sitemap found anywhere |
W01 |
WARNING | Missing robots.txt file |
W02 |
WARNING | Empty robots.txt file |
W03 |
WARNING | File too large (>500 KiB, Google truncates) |
W04 |
WARNING | Blocks major crawlers (Googlebot, Bingbot, GPTBot, ClaudeBot) |
W05 |
WARNING | Sitemap exists but not declared in robots.txt |
W06 |
WARNING | Restrictive crawl budget (<0.5 pages/day) |
W07 |
WARNING | Blocks static resources (/images/, /css/, /js/) |
W08 |
WARNING | Orphaned rules (before any User-agent) |
W09 |
WARNING | Deprecated noindex directive |
W10 |
WARNING | Sitemap URL is relative, not absolute |
Checks viewport meta tag presence and configuration for mobile-friendly design. Detects missing viewports, disabled zoom, fixed-width layouts, and inconsistent viewport coverage.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Missing viewport meta tag |
W01 |
WARNING | Viewport issues (user-scalable=no, fixed width, missing initial-scale) |
W02 |
WARNING | Partial viewport coverage: inconsistent across pages |
Detects meta robots directives (noindex, nofollow) and meta refresh redirects across all pages. Flags critical issues like noindex on homepage, sitemap conflicts, and HTML-level redirects.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Homepage has noindex (entire site may be deindexed) |
E02 |
ERROR | Page in sitemap.xml has noindex (conflicting signals) |
E03 |
ERROR | Homepage uses meta refresh redirect (should use HTTP 301) |
E04 |
ERROR | Page has meta refresh with delay > 0 (accessibility + SEO issue) |
E05 |
ERROR | URL disallowed in robots.txt but has meta noindex — conflicting signals, Google cannot see noindex and may index URL without snippet |
W01 |
WARNING | Non-homepage page has noindex directive |
W02 |
WARNING | Page has nofollow-only directive |
W05 |
WARNING | Page has instant meta refresh (delay=0) — should use HTTP 301 |
W06 |
WARNING | Meta refresh redirects to external domain |
W07 |
WARNING | Meta refresh target differs from canonical URL |
Per-URL analysis of HTTP response headers captured by the crawler: cookie security flags, server fingerprint, X-Frame-Options / X-Content-Type-Options / HSTS coverage across all pages, ETag inode leak, TTFB statistics. Complements 014 (homepage-only score) with site-wide coverage.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Cookie set on HTTPS page without Secure flag (MITM cookie theft) |
W01 |
WARNING | Cookie without HttpOnly flag (XSS cookie theft) |
W02 |
WARNING | Cookie without SameSite attribute (CSRF risk) |
W03 |
WARNING | Server banner exposes version (fingerprint) |
W04 |
WARNING | X-Powered-By header leaks backend stack |
W05 |
WARNING | Missing X-Frame-Options header (per-URL) |
W06 |
WARNING | Missing X-Content-Type-Options: nosniff (per-URL) |
W07 |
WARNING | Missing Strict-Transport-Security on HTTPS URL (per-URL) |
W08 |
WARNING | ETag exposes Apache inode number (fingerprint) |
Search engine optimization, structured data, metadata, linking
Validates H1-H6 heading hierarchy, checking for missing H1, multiple H1 tags, and heading level skips that harm SEO and accessibility.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Missing H1: page has zero H1 headings |
W01 |
WARNING | Multiple H1: page has more than one H1 heading |
W02 |
WARNING | Hierarchy skip: heading level skipped (e.g., H1 to H3) |
Finds pages sharing identical H1 headings, title tags or meta descriptions, which dilutes SEO value and confuses search engines.
| Code | Severity | Description |
|---|---|---|
W01 |
WARNING | Duplicate H1: same H1 text appears on multiple pages |
W02 |
WARNING | Duplicate title: same title tag appears on multiple pages |
W03 |
WARNING | Duplicate meta description: same description text appears on multiple pages |
Checks heading text length for SEO optimization. Headings that are too short provide little SEO value, while overly long headings may be truncated.
| Code | Severity | Description |
|---|---|---|
W01 |
WARNING | Heading too short: less than 10 characters |
W02 |
WARNING | Heading too long: more than 70 characters |
Validates HTML lang attributes, canonical URLs, and hreflang implementations for multilingual SEO. Checks for missing, invalid, or conflicting declarations.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Missing lang attribute in HTML tag |
E02 |
ERROR | Missing or invalid canonical URL |
E03 |
ERROR | Missing reciprocal hreflang (A links to B but B does not link to A) |
E04 |
ERROR | Hreflang href is not absolute URL |
E05 |
ERROR | Canonical conflicts with hreflang |
W01 |
WARNING | Missing x-default hreflang entry |
W02 |
WARNING | Canonical URL issues (relative, non-self, trailing slash mismatch) |
W03 |
WARNING | HTTPS page but hreflang/canonical uses HTTP |
Validates Open Graph meta tags and Twitter Cards for social sharing. Checks required tags, content quality, consistency with page metadata, and duplicate detection.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | No Open Graph tags found on page |
E02 |
ERROR | Missing required OG tags (og:title, og:description) |
E03 |
ERROR | OG image or URL is relative (not absolute) |
W01 |
WARNING | Missing og:image tag |
W02 |
WARNING | OG tag quality issues (too long, too short, HTTP image) |
W03 |
WARNING | Missing or invalid Twitter Card |
W04 |
WARNING | Duplicate OG tags across pages |
W05 |
WARNING | OG metadata inconsistent with page title/description/canonical |
Evaluates image ALT text coverage and quality. Detects missing ALT attributes, empty ALT text, untranslated ALT for multilingual sites, and overly long descriptions.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Image with empty or missing ALT attribute |
W01 |
WARNING | Image ALT quality issue (untranslated, too long, very short) |
Validates JSON-LD structured data blocks against Schema.org specifications. Checks syntax, required fields per type, URL format, and type recognition.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | JSON-LD syntax error (invalid JSON) |
E02 |
ERROR | Missing @type property |
E03 |
ERROR | Required fields missing for Schema.org type |
W01 |
WARNING | URL fields are not absolute URLs |
W02 |
WARNING | Recommended fields missing |
W03 |
WARNING | Unknown or unrecognized Schema.org @type |
Analyzes internal linking structure using BFS from homepage. Detects orphan pages, deep pages (>4 clicks), generic anchor text, and self-linking patterns.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Sitemap-orphan page: URL declared in sitemap.xml but no crawled page links to it (client forgot navigation path) |
W01 |
WARNING | Deep page: reachable only with more than 4 clicks from homepage |
W02 |
WARNING | Generic anchor text (click here, read more, etc.) |
W03 |
WARNING | Self-linking: page links to itself |
W04 |
WARNING | Dead-end page: no outgoing internal links (blocks PageRank flow, excludes whitelisted terminals like contact/privacy) |
Compares the just-generated sitemap.xml with the previous scan's snapshot. Detects URL count spikes/drops (vs 7-day baseline) and suspicious new URLs (webshell paths, spam keywords, exotic charsets). Replaces the daily sitemap watchdog: the customer's sitemap.xml is SiteScan output, so meaningful diffs only happen between consecutive scans, not day-by-day.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Suspicious URLs added (an_sm_03): webshell paths /wp-content/uploads/*.php, double-extension, spam keywords pharma/casino/replica, cyrillic/CJK/arabic chars |
W01 |
WARNING | URL count spike (an_sm_02): +30% AND +50 absolute URLs vs previous snapshot |
W02 |
WARNING | URL count drop (an_sm_04): -30% AND -50 absolute URLs vs previous snapshot (de-indexing or content removal) |
Verifies that the homepage's JSON-LD structured data actually describes the real business, by comparing it against the AI-derived site profile (context_llm.json). Parses the whole document (head+body), so it catches body-injected JSON-LD that 021 misses. Two layers: deterministic NAP match (name/phone/email/locality) + neutral AI semantic judgement (Gemini) that also flags moved/empty/parked sites, language drift, or a different business — without presuming the cause. Homepage-only.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Structured data (JSON-LD) describe a different entity than the business — 2+ NAP contradictions (e.g. agency placeholder name/address) |
E02 |
ERROR | Homepage incoherent with the expected business per AI: site moved/parked/empty, different activity, or different main language |
W01 |
WARNING | Single NAP discrepancy between JSON-LD and the business profile |
W02 |
WARNING | Minor semantic discrepancy between live homepage and expected profile (AI) |
HTTP security headers, JS vulnerabilities, mixed content
Validates all external links: detects broken links (4xx/5xx), connection failures, slow responses, HTTP-only links, and suspicious URL shorteners.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Critical error: 403, 404, 500, 502, 503, 504 or connection failure |
W01 |
WARNING | Manual verification needed: WAF-protected domain |
W02 |
WARNING | Warning: 401, 408, 429 status, slow response, HTTP not HTTPS, or suspicious shortener |
Evaluates 10 HTTP security headers using Mozilla Observatory methodology. Grades from A+ to F based on CSP, HSTS, cookie flags, CORS, X-Frame-Options, and more.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Critical security header missing or misconfigured (score impact >= -20) |
W01 |
WARNING | Security header improvement recommended (score impact -1 to -19) |
Scans JavaScript libraries for known CVE vulnerabilities using Retire.js database. Reports severity (Critical/High/Medium/Low), affected versions, and available fixes.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Critical or High CVE vulnerability (CVSS >= 7.0) |
W01 |
WARNING | Medium or Low CVE vulnerability (CVSS < 7.0) |
Detects mixed content on HTTPS sites: HTTP scripts and stylesheets (blocked by browsers), HTTP images in OG tags, HTTP canonical/internal links, protocol-relative resource URLs, and unsafe cross-origin links.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Script loaded via HTTP (active mixed content, blocked by browsers) |
E02 |
ERROR | Stylesheet loaded via HTTP (active mixed content, blocked by browsers) |
W01 |
WARNING | og:image uses HTTP on HTTPS site |
W02 |
WARNING | Canonical URL uses http:// on HTTPS site |
W03 |
WARNING | og:url uses http:// on HTTPS site |
W04 |
WARNING | Internal links with explicit http:// protocol |
W05 |
WARNING | Resource loaded via protocol-relative URL (//...) instead of explicit https:// |
W06 |
WARNING | Cross-origin link with target="_blank" missing rel="noopener" (tab-nabbing risk) |
W07 |
WARNING | Render-blocking scripts: page has threshold+ scripts without async/defer |
W08 |
WARNING | External CDN script missing SRI integrity hash (MITM tampering surface) |
W09 |
WARNING | Script type=module without crossorigin attribute (CORS/typed errors) |
Active pentest-style server security testing: TRACE/XST, host header injection, dangerous HTTP methods, sensitive file exposure, directory listing, TLS weakness, CRLF injection, verbose errors, path traversal, backup file discovery, security.txt compliance.
| Code | Severity | Description |
|---|---|---|
T01 |
WARNING | TRACE method enabled (Cross-Site Tracing risk) |
T02 |
WARNING | Host header injection accepted |
T03 |
ERROR | Dangerous HTTP methods enabled (PUT, DELETE, MKCOL, PROPFIND) |
T04 |
ERROR | Sensitive file accessible (.env, .git/HEAD, .htpasswd, phpinfo.php) |
T05 |
WARNING | Directory listing enabled |
T06 |
WARNING | Weak TLS version supported (TLS 1.0/1.1) |
T07 |
ERROR | CRLF injection possible |
T08 |
WARNING | Verbose error pages expose server information |
T09 |
ERROR | Path traversal possible |
T10 |
WARNING | X-Forwarded-Host injection accepted |
T11 |
WARNING | Default virtual host exposure via direct IP access |
T12 |
ERROR | Backup file accessible (.bak, .old, .swp) |
T13 |
INFO | security.txt (RFC 9116) missing or non-compliant |
T14 |
ERROR | CORS reflects arbitrary Origin with credentials (cross-origin data theft) |
T15 |
ERROR | CORS allows null Origin (exploitable via sandboxed iframes) |
T16 |
WARNING | CORS validates Origin with prefix/substring match instead of exact match |
T17 |
ERROR | CONNECT method accepted — server acts as open proxy |
T18 |
ERROR | X-Original-URL header overrides request path (access control bypass) |
T19 |
ERROR | X-Rewrite-URL header overrides routing (access control bypass) |
T20 |
WARNING | HTTP method override headers accepted (method restriction bypass) |
T21 |
ERROR | Extended sensitive file accessible (.env.local, phpinfo variants, .npmrc) |
T22 |
ERROR | SSL certificate expired, expiring soon, self-signed, or hostname mismatch |
T23 |
ERROR | Weak cipher suite accepted (RC4, DES, 3DES, NULL, EXPORT) |
T24 |
WARNING | crossdomain.xml allows access from any domain (wildcard) |
T25 |
INFO | Dependency files exposed (composer.json, package.json, requirements.txt) |
T26 |
ERROR | Admin panel publicly accessible (phpMyAdmin, Adminer, wp-admin) |
T27 |
WARNING | Session cookie missing security attributes (HttpOnly, Secure, SameSite) |
T28 |
WARNING | Information disclosure via response headers (X-Debug-Token, X-Backend-Server) |
T29 |
INFO | Missing modern security headers (Permissions-Policy, COOP, COEP) |
T30 |
WARNING | Dynamic page (sets cookies) missing Cache-Control no-store/private |
Grammar, image ALT text, accessibility, content quality
Checks heading accessibility: empty headings, headings hidden via CSS, and image-only headings that screen readers cannot interpret.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Empty heading: text content shorter than minimum length |
E02 |
ERROR | Hidden H1: H1 heading with hidden CSS class |
W01 |
WARNING | Hidden heading: H2-H6 heading with hidden CSS class |
W02 |
WARNING | Image-only heading: both raw_text and text are empty |
Analyzes grammar, spelling, and linguistic quality across all site languages using AI (Claude/Gemini). Reports errors, warnings, and suggestions with corrected text.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Grammar or spelling error (misspelling, wrong agreement, wrong word) |
W01 |
WARNING | Grammar warning (spacing, capitalization, consistency) |
I01 |
INFO | Stylistic suggestion (readability, rephrasing) |
Page load, static resources, browser diagnostics
Browser-based homepage analysis using headless Chrome. Reports JavaScript console errors, failed network requests, third-party domains, and page load performance metrics.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | JavaScript console error or failed network request |
W01 |
WARNING | JavaScript console warning |
Validates static resources (CSS, JS, images, fonts): detects broken resources (4xx/5xx), oversized files (>5MB), redirected resources, and malformed URLs.
| Code | Severity | Description |
|---|---|---|
E01 |
ERROR | Broken resource (4xx/5xx status) or oversized file (>5MB) |
W01 |
WARNING | Resource redirect, malformed URL, timeout, or server error |