Skip to content

Commit

Permalink
Command line tool for XML sync testing between languages: tags, revta…
Browse files Browse the repository at this point in the history
…g, PI, ws
  • Loading branch information
André L F S Bacci committed Feb 14, 2025
1 parent 80d1314 commit ba3cfab
Show file tree
Hide file tree
Showing 9 changed files with 616 additions and 71 deletions.
155 changes: 101 additions & 54 deletions scripts/translation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,19 @@ Because of the above, it's possible to silence each alert indempendly. These
scripts will output `--add-ignore` commands that, if executed, will omit the
specific alerts in future executions.

## First execution
## broken.php

The first execution of these scripts may generate an inordinate amount of
alerts. It's advised to initially run each command separately, and work the
alerts on a case by case basis. After all interesting cases are fixed,
it's possible to rerun the command and `grep` the output for `--add-ignore`
lines, run these commands, and by so, mass ignore the residual alerts.
`doc-base/scripts/broken.php` will test if individual XML files are
ill-formed. That is, if a file contains Unicode BOM, carriage returns (CR),
or if XML contents are not
[well-balanced](https://www.w3.org/TR/xml-fragment/#defn-well-balanced).

Unbalanced XML contents are invalid XML and will result in a broken build.
BOM and CR marks may not result in broken builds, but *will* cause several
tools below to misbehave, as `libxml` behaviour changes if XML text contains
these bytes.

## qaxml-attributes.php (structural)
## qaxml-attributes.php

`doc-base/scripts/translation/qaxml-attributes.php` checks if all translated
files have the same tag-attribute-value triplets. Tag's attributes are
Expand All @@ -35,7 +39,7 @@ This script accepts an `--urgent` option, to filter alerts related to `xml:id`
attributes. This will help translators on languages that are failing to build,
to focus on mismatches that are probably most related with build fails.

## qaxml-entities.php (structural)
## qaxml-entities.php

`doc-base/scripts/translation/qaxml-entities.php` checks if all translated
files contain the same XML Entities References as the original files.
Expand All @@ -55,15 +59,99 @@ entities when generating alerts. This is handy in languages that use some
`&zb;` and `&dh;` entities, and could run with `-zb -dh` to avoid generating
alerts for these entities' differences.

## Old tools (below)
## qaxml-pi.php

`doc-base/scripts/translation/qaxml-pi.php` checks if all translated files have
the same processing instructions (PI) as the original files. Unbalanced PIs may
cause compilation errors, as they are utilized in the manual build process.

## qaxml-tags.php

`doc-base/scripts/translation/qaxml-tags.php` checks if all translated files
have the same tags as the original files. Different number of tags between
source texts and translations indicated mismatched translated texts, and may
cause compilation errors

This script accepts an `--detail` option, that will print lines of each
mismatched tag, to facilitate the work on big files.

This script also accepts an `--content=` option, that will check the
*contents* of tags, to inspect tags where the contents are expected *not* to
be translated. Example below.

## qaxml-ws.php

`doc-base/scripts/translation/qaxml-ws.php` inspect whitespace usage inside
some known tags. Spurious whitespace may break manual linking or generate
visible artifacts.

## qaxml-revtag.php

`doc-base/scripts/translation/qaxml-revtag.php` checks if all translated
files have valid [revision tags](https://doc.php.net/guide/translating.md).
Files without revision tags in expected format will fail to generate pretty
diffs on [Translation status](https://doc.php.net/revcheck.php) website or
locally generated `revcheck.php` status pages.

## Suggested execution

The first execution of these scripts may generate an inordinate amount of
alerts. It's advised to initially run each command separately, and work the
alerts on a case by case basis. After all interesting cases are fixed,
it's possible to rerun the command and `grep` the output for `--add-ignore`
lines, run these commands, and by so, mass ignore the residual alerts.

Structural checks:

```
php doc-base/scripts/broken.php
php doc-base/scripts/translation/qaxml-revtag.php
php doc-base/scripts/translation/qaxml-attributes.php
php doc-base/scripts/translation/qaxml-entities.php
php doc-base/scripts/translation/qaxml-pi.php
php doc-base/scripts/translation/qaxml-tags.php --detail
php doc-base/scripts/translation/qaxml-ws.php
```

The tools on `doc-base/scripts/translation/` are slowly being rewritten. While
this effort is not complete, the previous tools, document below, could be used
to supply for features yet not completed.
Tags where is expected no translations:

```
php doc-base/scripts/translation/qaxml-tags.php --content=acronym
php doc-base/scripts/translation/qaxml-tags.php --content=classname
php doc-base/scripts/translation/qaxml-tags.php --content=constant
php doc-base/scripts/translation/qaxml-tags.php --content=envar
php doc-base/scripts/translation/qaxml-tags.php --content=function
php doc-base/scripts/translation/qaxml-tags.php --content=interfacename
php doc-base/scripts/translation/qaxml-tags.php --content=parameter
php doc-base/scripts/translation/qaxml-tags.php --content=type
php doc-base/scripts/translation/qaxml-tags.php --content=classsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=constructorsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=destructorsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=fieldsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=funcsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=methodsynopsis
```

Tags where is expected few translations:

```
php doc-base/scripts/translation/qaxml-tags.php --content=code
php doc-base/scripts/translation/qaxml-tags.php --content=computeroutput
php doc-base/scripts/translation/qaxml-tags.php --content=filename
php doc-base/scripts/translation/qaxml-tags.php --content=literal
php doc-base/scripts/translation/qaxml-tags.php --content=varname
```

---

Before using the old scripts, they need be configured:
## Old tools (below)

Document below is the previous version of these tools. These tools are
deprecated, and scheduled for remotion very soon.


These old tools needed to be separated configured, before use:
```
php doc-base/scripts/translation/configure.php $LANG_DIR
```
Expand Down Expand Up @@ -107,44 +195,3 @@ contents, as some tag contents are expected *not* be translated.

`--detail` will also print line definitions of each mismatched tag,
to facilitate bitsecting.

## Suggested execution

Structural checks:

```
php doc-base/scripts/translation/configure.php $LANG_DIR
php doc-base/scripts/translation/qarvt.php
php doc-base/scripts/translation/qaxml.a.php
php doc-base/scripts/translation/qaxml.e.php
php doc-base/scripts/translation/qaxml.p.php
php doc-base/scripts/translation/qaxml.t.php
php doc-base/scripts/translation/qaxml.w.php
```
Tags where is expected no translations:
```
php doc-base/scripts/translation/qaxml.t.php acronym
php doc-base/scripts/translation/qaxml.t.php classname
php doc-base/scripts/translation/qaxml.t.php constant
php doc-base/scripts/translation/qaxml.t.php envar
php doc-base/scripts/translation/qaxml.t.php function
php doc-base/scripts/translation/qaxml.t.php interfacename
php doc-base/scripts/translation/qaxml.t.php parameter
php doc-base/scripts/translation/qaxml.t.php type
php doc-base/scripts/translation/qaxml.t.php classsynopsis
php doc-base/scripts/translation/qaxml.t.php constructorsynopsis
php doc-base/scripts/translation/qaxml.t.php destructorsynopsis
php doc-base/scripts/translation/qaxml.t.php fieldsynopsis
php doc-base/scripts/translation/qaxml.t.php funcsynopsis
php doc-base/scripts/translation/qaxml.t.php methodsynopsis
```
Tags where is expected few translations:
```
php doc-base/scripts/translation/qaxml.t.php code
php doc-base/scripts/translation/qaxml.t.php computeroutput
php doc-base/scripts/translation/qaxml.t.php filename
php doc-base/scripts/translation/qaxml.t.php literal
php doc-base/scripts/translation/qaxml.t.php varname
```
4 changes: 3 additions & 1 deletion scripts/translation/libqa/ArgvParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ class ArgvParser
public function __construct( array $argv )
{
$this->argv = array_values( array_filter( $argv ) );
$this->used = [];
$this->used = array_fill( 0 , count( $argv ) , false );
}

Expand Down Expand Up @@ -58,6 +57,9 @@ public function consume( string $equals = null , string $prefix = null , int $po
$this->argv[ $pos ] = null;
$this->used[ $pos ] = true;

if ( $foundByPrefix )
return substr( $arg , strlen( $prefix ) );

return $arg;
}
}
Expand Down
10 changes: 7 additions & 3 deletions scripts/translation/libqa/OutputBuffer.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ class OutputBuffer
private OutputIgnore $ignore;
private string $options;

public int $printCount = 0;

public function __construct( string $header , string $filename , OutputIgnore $ignore )
{
$filename = str_replace( "/./" , "/" , $filename );
Expand Down Expand Up @@ -81,7 +83,7 @@ public function contains( string $text ) : bool
return false;
}

public function print( bool $useAlternatePrinting = false )
public function print( bool $alternatePrinting = false )
{
if ( count( $this->matter ) == 0 && count( $this->footer ) == 0 )
return;
Expand All @@ -93,9 +95,11 @@ public function print( bool $useAlternatePrinting = false )
if ( $this->ignore->shouldIgnore( $this , $hashFile , $hashHead , $hashFull ) )
return;

$this->printCount++;

print $this->header;

if ( $useAlternatePrinting )
if ( $alternatePrinting )
$this->printMatterAlternate();
else
foreach( $this->matter as $text )
Expand Down Expand Up @@ -128,8 +132,8 @@ private function printMatterAlternate() : void

for ( $idx = 0 ; $idx < count( $this->matter ) ; $idx++ )
{
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
if ( isset( $del[ $idx ] ) ) print $del[ $idx ];
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
}

foreach( $rst as $text )
Expand Down
19 changes: 8 additions & 11 deletions scripts/translation/libqa/OutputIgnore.php
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,21 @@

class OutputIgnore
{
private bool $appendIgnores = true;
private bool $showIgnore = true;
private string $filename = ".qaxml.ignores";
private string $argv0 = "";

public bool $appendIgnoreCommands = true;
public ArgvParser $argv;

public function __construct( ArgvParser $argv )
{
$this->argv = $argv;
$this->argv0 = escapeshellarg( $argv->consume( position: 0 ) );

$arg = $argv->consume( prefix: "--add-ignore=" );
$item = $argv->consume( prefix: "--add-ignore=" );

if ( $arg != null )
if ( $item != null )
{
$item = substr( $arg , 13 );
$list = $this->loadIgnores();
if ( ! in_array( $item , $list ) )
{
Expand All @@ -46,10 +44,9 @@ public function __construct( ArgvParser $argv )
exit;
}

$arg = $argv->consume( prefix: "--del-ignore=" );
if ( $arg != null )
$item = $argv->consume( prefix: "--del-ignore=" );
if ( $item != null )
{
$item = substr( $arg , 13 );
$list = $this->loadIgnores();
$dels = 0;
while ( in_array( $item , $list ) )
Expand All @@ -66,7 +63,7 @@ public function __construct( ArgvParser $argv )
}

if ( $argv->consume( "--disable-ignore" ) != null )
$this->showIgnore = false;
$this->appendIgnoreCommands = false;
}

private function loadIgnores()
Expand Down Expand Up @@ -96,12 +93,12 @@ public function shouldIgnore( OutputBuffer $output , string $hashFile , string $
if ( in_array( $active , $marks ) )
$ret = true;
else
if ( $this->showIgnore )
if ( $this->appendIgnoreCommands )
$output->addFooter( " php {$this->argv0} --add-ignore=$active\n" );

// --del-ignore command

if ( $this->showIgnore )
if ( $this->appendIgnoreCommands )
foreach ( $marks as $mark )
if ( str_starts_with( $mark , $prefix ) )
if ( $mark != $active )
Expand Down
4 changes: 2 additions & 2 deletions scripts/translation/libqa/XmlFrag.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@ static function listNodesRecurse( DOMNode $node , int $type, array & $ret )
XmlFrag::listNodesRecurse( $child , $type, $ret );
}

static function loadXmlFragmentFile( string $filename )
static function loadXmlFragmentFile( string $filename , bool $fakeDtdForMissingEntity = true )
{
$contents = file_get_contents( $filename );

[ $doc , $ent , $err ] = XmlFrag::loadXmlFragmentText( $contents , "" );

if ( count( $err ) == 0 )
if ( count( $err ) == 0 || $fakeDtdForMissingEntity == false )
return [ $doc , $ent , $err ];

$dtd = "<?xml version='1.0' encoding='utf-8'?>\n<!DOCTYPE frag [\n";
Expand Down
71 changes: 71 additions & 0 deletions scripts/translation/qaxml-pi.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<?php /*
+----------------------------------------------------------------------+
| Copyright (c) 1997-2025 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 3.01 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| https://www.php.net/license/3_01.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| license@php.net, so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Authors: André L F S Bacci <ae php.net> |
+----------------------------------------------------------------------+
# Description
Compare processing instructions usage between two XML files. */

require_once __DIR__ . '/libqa/all.php';

$argv = new ArgvParser( $argv );
$ignore = new OutputIgnore( $argv ); // may exit.
$argv->complete();

$list = SyncFileList::load();

foreach ( $list as $file )
{
$source = $file->sourceDir . '/' . $file->file;
$target = $file->targetDir . '/' . $file->file;
$output = new OutputBuffer( "# qaxml.p" , $target , $ignore );

[ $s , $_ , $_ ] = XmlFrag::loadXmlFragmentFile( $source );
[ $t , $_ , $_ ] = XmlFrag::loadXmlFragmentFile( $target );

$s = XmlFrag::listNodes( $s , XML_PI_NODE );
$t = XmlFrag::listNodes( $t , XML_PI_NODE );

$s = extractPiData( $s );
$t = extractPiData( $t );

if ( implode( "\n" , $s ) == implode( "\n" , $t ) )
continue;

$sideCount = array();

foreach( $s as $v )
$sideCount[$v] = [ 0 , 0 ];
foreach( $t as $v )
$sideCount[$v] = [ 0 , 0 ];

foreach( $s as $v )
$sideCount[$v][0] += 1;
foreach( $t as $v )
$sideCount[$v][1] += 1;

foreach( $sideCount as $k => $v )
if ( $v[0] != $v[1] )
$output->addDiff( $k , $v[0] , $v[1] );

$output->print();
}

function extractPiData( array $list )
{
$ret = array();
foreach( $list as $elem )
$ret[] = "{$elem->target} {$elem->data}";
return $ret;
}
Loading

0 comments on commit ba3cfab

Please sign in to comment.