Skip to content

Commit 8059305

Browse files
author
Iain Collins
committed
Implement v3.0 roadmap features
In this release, Schema.org validation and Google Structured Data validation have been split so markup can validated independently for technical correctness and for vendor compatibility. There are easy to accommodate but breaking API changes in this release, which should provide a stable API foundation for future updates. * Automatically detects Schema.org markup and tests it separately from vendor specific validation. * You can now use `--schemas` to specify schemas to check for as well as to list them. * You can optionally prefix schema names with 'jsonld:', 'microdata:', 'rdfa:' to check for a schema specified in a particular way (e.g. `--schemas Article` or `--schemas jsonld:Article` or `--schemas Microdata:Article`). * You can specify one or more schemas to check for, the same way the `--presets` option works (e.g. `--schemas "jsonld:Article,BreadcrumbList,microdata:WPFooter"`) * Presets can now contain other presets (to any level), to allow for easier organisation of tests. e.g. use `--presets SocialMedia` to test social media markup, or `--presets Google` to apply rules based on those for the Google Structured Data Testing Tool. * Presets can now have conditional tests to determine if a preset is invoked, which is useful when grouping presets inside other presets, so that groups of tests are only applied when appropriate. * Individual tests within presets can now have conditional tests to determine if they should run. * Added new preset called 'Google' which checks commonly used schemas for structured data properties that Google expects to see. * Added a new preset called 'SocialMedia', containing both Twitter and Facebook presets. * Removed `--disable-presets` option as is now redundant. * All options available via both API and Command Line Tool. * Updated README and inline documentation. * This update does not validate if individual Schema.org schema properties exist or check if the contents of those properties are valid (this is possible with the API using a custom preset). * The Google preset is contains subset of tests based on responses from the Google Structured Data Testing Tool and tests for all known Article schema types, but does not validate all supported Schemas or test the contents of properties using the same rules (this is possible with the API using a custom preset).
1 parent 2c68c6c commit 8059305

28 files changed

+481
-390
lines changed

README.md

+83-51
Original file line numberDiff line numberDiff line change
@@ -2,52 +2,52 @@
22

33
Helps inspect and test web pages for Structured Data.
44

5-
Designed to allow automation and quick ad-hoc testing of structured data - especially in bulk or as part of a CD/CI pipeline.
5+
The structured data testing tool is designed to allow automation and quick ad-hoc testing of structured data - especially in bulk or as part of a CD/CI pipeline.
6+
7+
This utility uses [web-auto-extractor](https://www.npmjs.com/package/web-auto-extractor) and [jmespath](https://www.npmjs.com/package/jmespath).
68

79
## Features
810

9-
* A Command Line Interface (`sdtt`) and an API for CD/CI integration.
11+
* Command Line Interface (`sdtt`) and an API for CD/CI integration.
1012
* Accepts any URL or a file to test (via string, buffer, stream…).
11-
* Tests pages for Schema.org markup in HTML (with microdata), JSON-LD and RDFa.
12-
* Tests `<meta>` tags for specific tags and values (e.g. for Twitter and Facebook sharing data, OpenGraph tags, App Store tags).
13-
* Tests if properties exist, should not exist and/or if they match a Regular Expression check.
14-
* Built-in 'presets' for testing common schema types (including all types of Article schemas).
13+
* Detects Schema.org markup in HTML (`microdata`), `JSON-LD` and `RDFa`.
14+
* Tests `<meta>` tags for specific tags and values (e.g. for social media / sharing).
15+
* Built-in presets for Twitter and Facebook tags.
16+
* Built-in presets for for testing and validating common structured data expected by Google.
1517
* API: Define your own re-useable, custom presets to write specific tests for your own site.
1618
* API: Use with a headless browser to test Structured Data injected by client side JavaScript (e.g. via Google Tag Manager).
1719
* CLI: Recognizes and displays info for all 1000+ schemas on Schema.org.
1820

19-
This tool uses [web-auto-extractor](https://www.npmjs.com/package/web-auto-extractor) and [jmespath](https://www.npmjs.com/package/jmespath).
20-
21-
Note: Schema.org does not define 'optional' and 'required' fields for schemas, it describes valid properties and what they may contain. Recommendations and tests in the built-in presets are based on practical errors and warnings returned by search engine providers.
22-
2321
## Install
2422

2523
npm i structured-data-testing-tool -g
2624

27-
## Features
28-
2925
## Usage
3026

3127
### Command Line Interface
3228

33-
_Note: The API supports additional options not currently exposed in the CLI tool._
34-
3529
```
36-
Usage: sdtt --url <url> [--presets <presets>]
30+
Usage: sdtt --url <url> [--presets <presets>] [--schemas <schemas>]
3731
3832
Options:
39-
-u, --url Inspect a URL
40-
-f, --file Inspect a file
41-
-p, --presets Test a URL for specific markup from a list of presets
42-
-d, --disable-presets Disable auto-detection of presets - will only evaluate explicitly specified presets
43-
-s, --schemas List valid schemas
44-
-h, --help Show help
45-
-v, --version Show version number
33+
-u, --url Inspect a URL
34+
-f, --file Inspect a file
35+
-p, --presets Test for specific markup from a list of presets
36+
-s, --schemas Test for a specific schema from a list of schemas
37+
-h, --help Show help
38+
-v, --version Show version number
4639
4740
Examples:
48-
sdtt --url "https://example.com/article" Inspect a URL
49-
sdtt --url <url> --presets "Article,Twitter,Facebook" Test a URL for Article schema and social metatags
50-
sdtt --presets List supported presets
41+
sdtt --url "https://example.com/article" Inspect a URL
42+
sdtt --url <url> --presets "Twitter,Facebook" Test a URL for specific metatags
43+
sdtt --url <url> --presets "SocialMedia" Test a URL for social media metatags
44+
sdtt --url <url> --presets "Google" Test a URL for markup inspected by Google
45+
sdtt --url <url> --schemas "Article" Test a URL for the Article schema
46+
sdtt --url <url> --schemas "jsonld:Article" Test a URL for the Article schema in JSON-LD
47+
sdtt --url <url> --schemas "microdata:Article" Test a URL for the Article schema in microdata/HTML
48+
sdtt --url <url> --schemas "rdfa:Article" Test a URL for the Article schema in RDFa
49+
sdtt --presets List all built-in presets
50+
sdtt --schemas List all supported schemas
5151
```
5252

5353
Inspect a URL to see what markup is found:
@@ -60,47 +60,76 @@ Inspect a file to see what markup is found:
6060

6161
Test a URL contains specific markup:
6262

63-
sdtt --url <url> --presets "Article,Twitter,Facebook"
63+
sdtt --url <url> --presets "Twitter,Facebook"
64+
65+
Test a URL contains specific schema:
66+
67+
sdtt --url <url> --schemas "Article"
68+
69+
Test a URL contains specific schema in both JSON-LD and in microdata/HTML:
70+
71+
sdtt --url <url> --schemas "jsonld:Article,microdata:Article"
6472

6573
#### Example output from CLI
6674

6775
```
68-
$ sdtt -u https://www.bbc.co.uk/news/world-us-canada-49060410
76+
$ sdtt --url https://www.bbc.co.uk/news/world-us-canada-49060410 --presets Google,SocialMedia
6977
Tests
7078
71-
ReportageNewsArticle Passed 14 of 14 (100%)
79+
Schema.org > ReportageNewsArticle - 100% (1 passed, 1 total)
80+
✓ schema in jsonld
81+
82+
Google > ReportageNewsArticle - 100% (12 passed, 12 total)
7283
✓ ReportageNewsArticle
7384
✓ ReportageNewsArticle[*]."@type"
74-
✓ ReportageNewsArticle[*].url
75-
✓ ReportageNewsArticle[*].mainEntityOfPage
76-
✓ ReportageNewsArticle[*].datePublished
77-
✓ ReportageNewsArticle[*].dateModified
7885
✓ ReportageNewsArticle[*].author
79-
✓ ReportageNewsArticle[*].author.name
80-
✓ ReportageNewsArticle[*].image
86+
✓ ReportageNewsArticle[*].datePublished
8187
✓ ReportageNewsArticle[*].headline
82-
✓ ReportageNewsArticle[*].publisher
88+
✓ ReportageNewsArticle[*].image
8389
✓ ReportageNewsArticle[*].publisher."@type"
8490
✓ ReportageNewsArticle[*].publisher.name
8591
✓ ReportageNewsArticle[*].publisher.logo
92+
✓ ReportageNewsArticle[*].publisher.logo.url
93+
✓ ReportageNewsArticle[*].dateModified
94+
✓ ReportageNewsArticle[*].mainEntityOfPage
95+
96+
SocialMedia > Facebook - 100% (8 passed, 8 total)
97+
✓ must have page title
98+
✓ must have page type
99+
✓ must have url
100+
✓ must have image url
101+
✓ must have image alt text
102+
✓ should have page description
103+
✓ should have account username
104+
✓ should have locale
105+
106+
SocialMedia > Twitter - 100% (7 passed, 7 total)
107+
✓ must have card type
108+
✓ must have title
109+
✓ must have description
110+
✓ must have image url
111+
✓ must have image alt text
112+
✓ should have account username
113+
✓ should have username of content creator
86114
87115
Statistics
88116
89117
Number of Metatags: 38
90118
Schemas in JSON-LD: 1
91119
Schemas in HTML: 0
92120
Schema in RDFa: 0
93-
Schemas found: ReportageNewsArticle
94-
Test suites run: ReportageNewsArticle
95-
Total tests run: 14
121+
Schema.org schemas: ReportageNewsArticle
122+
Other schemas: 0
123+
Test groups run : 4
124+
Total tests run: 28
96125
97126
Results
98127
99-
Passed: 14 (100%)
128+
Passed: 28 (100%)
100129
Warnings: 0 (0%)
101130
Failed: 0 (0%)
102131
103-
14 tests passed.
132+
28 tests passed with 0 warnings.
104133
```
105134

106135
### API
@@ -116,22 +145,22 @@ const { ReportageNewsArticle, Twitter, Facebook } = require('./presets')
116145
const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
117146

118147
structuredDataTest(url, { presets: [ ReportageNewsArticle, Twitter, Facebook ] })
119-
.then(response => {
148+
.then(res => {
120149
// If you end up here, then there were no errors
121150
console.log("All tests passed.")
122-
console.log('Passed:',response.passed.length)
123-
console.log('Failed:',response.failed.length)
124-
console.log('Warnings:',response.warnings.length)
151+
console.log('Passed:',res.passed.length)
152+
console.log('Failed:',res.failed.length)
153+
console.log('Warnings:',res.warnings.length)
125154
})
126155
.catch(err => {
127156
// If any test fails, the promise is rejected
128157
if (err.type === 'VALIDATION_FAILED') {
129158
console.log("Some tests failed.")
130-
console.log('Passed:',err.passed.length)
131-
console.log('Failed:',err.failed.length)
132-
console.log('Warnings:',err.warnings.length)
159+
console.log('Passed:',err.res.passed.length)
160+
console.log('Failed:',err.res.failed.length)
161+
console.log('Warnings:',err.res.warnings.length)
133162
// Loop over validation errors
134-
err.failed.forEach(test => {
163+
err.res.failed.forEach(test => {
135164
console.error(test)
136165
})
137166
} else {
@@ -193,13 +222,16 @@ const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
193222
const MyCustomPreset = {
194223
name: 'My Custom Preset', // Required
195224
description: 'Test NewsArticle JSON-LD data is defined and twitter metadata was found', // Required
196-
tests: [ // Required
225+
tests: [ // Required (unless 'presets' specified)
197226
{ test: 'NewsArticle', type: 'jsonld', schema: 'NewsArticle' },
198227
{ test: '"twitter:card"', type: 'metatag' },
199228
{ test: '"twitter:domain"', expect: 'www.bbc.co.uk', type: 'metatag', }
200229
],
201-
group: 'A Group Name', // Optional: A group name can be used to group tests in a preset (defaults to preset name)
202-
// schema: 'NewsArticle', // Optional: A default schema for tests (useful if tests in a preset are all for the same schema)
230+
// Options:
231+
// group: 'My Group Name', // A group name can be used to group tests in a preset (defaults to preset name)
232+
// schema: 'NewsArticle', // A default schema for tests (useful if tests in a preset are all for the same schema)
233+
// presets: [] // Any preset can also contain other presets
234+
// conditional: {} // Both Presets and Tests can define a conditional `test`, which is evaluated to determine if they should run
203235
}
204236

205237
const options = {

__tests__/index.js

+15-14
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ describe('Structured Data parsing', () => {
2626
// Ideally there would be multiple different fixtures that more robustly
2727
// test different scenarios, but this is is a practical approach that
2828
// improves coverage easily for now with minimal effort
29-
await structuredDataTest(html)
29+
await structuredDataTest(html, { presets: [ presets.Google ]})
3030
.then(response => {
3131
structuredDataTestResult = response
3232
})
@@ -50,15 +50,15 @@ describe('Structured Data parsing', () => {
5050
})
5151

5252
test('should auto-detect when input is a string', async () => {
53-
const result = await structuredDataTest(html.toString())
53+
const result = await structuredDataTest(html.toString(), { presets: [ presets.Google ]})
5454
expect(result.passed.length).toBeGreaterThan(10)
5555
expect(result.failed.length).toEqual(0)
5656
})
5757

5858
test('should auto-detect when input is a buffer', async () => {
5959
const result = await new Promise((resolve) => {
6060
fs.readFile(testFile, async (err, buffer) => {
61-
return resolve(await structuredDataTest(buffer))
61+
return resolve(await structuredDataTest(buffer, { presets: [ presets.Google ]}))
6262
})
6363
})
6464
expect(result.passed.length).toBeGreaterThan(10)
@@ -67,50 +67,51 @@ describe('Structured Data parsing', () => {
6767

6868
test('should auto-detect when input is a readable stream', async () => {
6969
const buffer = fs.createReadStream(testFile)
70-
const result = await structuredDataTest(buffer)
70+
const result = await structuredDataTest(buffer, { presets: [ presets.Google ]})
7171
expect(result.passed.length).toBeGreaterThan(10)
7272
expect(result.failed.length).toEqual(0)
7373
})
7474

7575
test('should auto-detect when input is an HTTP URL', async () => {
76-
const result = await structuredDataTest('http://example.com')
76+
const result = await structuredDataTest('http://example.com', { presets: [ presets.Google ]})
7777
expect(result.passed.length).toBeGreaterThan(10)
7878
expect(result.failed.length).toEqual(0)
7979
})
8080

8181
test('should auto-detect when input is an HTTPS URL', async () => {
82-
const result = await structuredDataTest('https://example.com')
82+
const result = await structuredDataTest('https://example.com', { presets: [ presets.Google ]})
8383
expect(result.passed.length).toBeGreaterThan(10)
8484
expect(result.failed.length).toEqual(0)
8585
})
8686

8787
test('should work when explicitly invoked with HTML', async () => {
88-
const result = await structuredDataTestHtml(html)
88+
const result = await structuredDataTestHtml(html, { presets: [ presets.Google ]})
8989
expect(result.passed.length).toBeGreaterThan(10)
9090
expect(result.failed.length).toEqual(0)
9191
})
9292

9393
test('should work when explicitly invoked with a URL', async () => {
94-
const result = await structuredDataTestUrl('https://example.com')
94+
const result = await structuredDataTestUrl('https://example.com', { presets: [ presets.Google ]})
9595
expect(result.passed.length).toBeGreaterThan(10)
9696
expect(result.failed.length).toEqual(0)
9797
})
9898

99-
test('should validate all structured data schemas found as well as any presets specified', async () => {
99+
// @FIXME This test covers too much at once, should split out error handling checks
100+
test('should validate all structured data schemas found as well as any presets specified and handle errors correctly', async () => {
100101
// Should validate schemas found, but also find errors as Facebook schema should not
101102
// be present in the example, but is passed as a preset so the test should fail.
102103
let result = ''
103-
await structuredDataTest(html, { presets: [ presets.Facebook ] })
104+
await structuredDataTest(html, { presets: [ presets.Facebook, presets.Google ] })
104105
.then(response => {
105106
result = response
106107
})
107108
.catch(err => {
108109
result = err
109110
})
110-
expect(result.schemas.length).toEqual(4)
111-
expect(result.schemas.includes('Facebook')).toBeFalsy()
112-
expect(result.passed.length).toBeGreaterThan(10)
113-
expect(result.failed.length).toBeGreaterThan(0)
111+
expect(result.res.schemas.length).toEqual(4)
112+
expect(result.res.schemas.includes('Facebook')).toBeFalsy()
113+
expect(result.res.passed.length).toBeGreaterThan(10)
114+
expect(result.res.failed.length).toBeGreaterThan(0)
114115
})
115116

116117
test('should run all tests passed as options and for any schemas found', async () => {

__tests__/lib/metatags.js

-75
This file was deleted.

0 commit comments

Comments
 (0)