tDOM

Check-in [0da31f4fcb]
Login

Check-in [0da31f4fcb]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Wip (not well working inbetween state).
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | wip
Files: files | file ages | folders
SHA3-256: 0da31f4fcb8d1805aa164d0fde6510595df4da541c49f904c03b44502e71f393
User & Date: rolf 2019-03-23 15:20:11.373
Context
2019-03-26
01:43
Clean up the mess, a bit. check-in: 7154504138 user: rolf tags: wip
2019-03-23
15:20
Wip (not well working inbetween state). check-in: 0da31f4fcb user: rolf tags: wip
2019-03-20
16:00
Still wip. check-in: 3bf5b1e10d user: rolf tags: wip
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/expat.xml.
28
29
30
31
32
33
34

35
36
37
38

39
40
41
42
43
44
45
to parse any kind of well-formed XML. The parsers are stream oriented XML
parser. This means that you register handler scripts with the parser prior to
starting the parse. These handler scripts are called when the parser discovers
the associated structures in the document being parsed.  A start tag is an
example of the kind of structures for which you may register a handler
script.</p>


<p>The parsers do not validate the XML document. They do parse the internal DTD
and, at request, external DTD and external entities, if you resolve the
identifier of the external entities with the -externalentitycommand script (see
there).</p>


<p>Additionly, the Tcl extension code that implements this command provides an
API for adding C level coded handlers. Up to now, there exists the parser
extension command "tdom". The handler set installed by this extension build an
in memory "tDOM" DOM tree, while the parser is parsing the input.</p>

<p>It is possible to register an arbitrary amount of different handler scripts







>
|
|
|
|
>







28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
to parse any kind of well-formed XML. The parsers are stream oriented XML
parser. This means that you register handler scripts with the parser prior to
starting the parse. These handler scripts are called when the parser discovers
the associated structures in the document being parsed.  A start tag is an
example of the kind of structures for which you may register a handler
script.</p>

<p>The parsers always check for XML well-formedness of the input (and
report error, if the input isn't well-formed). They parse the internal
DTD and, at request, external DTD and external entities, if you
resolve the identifier of the external entities with the
-externalentitycommand script (see there). If you use the -validateCmd
option (see there), the input is additionally validated.</p>

<p>Additionly, the Tcl extension code that implements this command provides an
API for adding C level coded handlers. Up to now, there exists the parser
extension command "tdom". The handler set installed by this extension build an
in memory "tDOM" DOM tree, while the parser is parsing the input.</p>

<p>It is possible to register an arbitrary amount of different handler scripts
77
78
79
80
81
82
83











84
85
86
87
88
89
90
until the next call to the parse method, thus delaying the reporting of some of
the data.</p>

<p>If this option is set to "1" then documents which are not well-formed upon
end of input will generate an error.</p></desc>
        </optdef>












        <optdef>
          <optname>-baseurl</optname>
          <optarg>url</optarg>

          <desc><p>Reports the base url of the document to the
parser.</p></desc>
        </optdef>







>
>
>
>
>
>
>
>
>
>
>







79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
until the next call to the parse method, thus delaying the reporting of some of
the data.</p>

<p>If this option is set to "1" then documents which are not well-formed upon
end of input will generate an error.</p></desc>
        </optdef>

        <optdef>
          <optname>-validateCmd</optname>
          <optarg>&lt;tdom schema cmd&gt;</optarg>

          <desc><p>This option expects the name of a tDOM schema
          command. If this option is given, then the input is also
          validated. If the schema command hasn't set a reportcmd then
          the first validation error will stop further parsing (as a
          well-formedness error).</p></desc>
        </optdef>

        <optdef>
          <optname>-baseurl</optname>
          <optarg>url</optarg>

          <desc><p>Reports the base url of the document to the
parser.</p></desc>
        </optdef>
Changes to generic/schema.c.
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
                            return 1;
                        }
                        popStack (sdata);
                        if (rc == -1) mayskip = 1;
                        break;

                    case SCHEMA_CTYPE_VIRTUAL:
                        if (evalVirtual (interp, sdata, icp)) break;
                        else return 0;
                        
                    }
                    if (!mayskip && mayMiss (candidate->quants[i]))
                        mayskip = 1;
                }
                break;








|
<







864
865
866
867
868
869
870
871

872
873
874
875
876
877
878
                            return 1;
                        }
                        popStack (sdata);
                        if (rc == -1) mayskip = 1;
                        break;

                    case SCHEMA_CTYPE_VIRTUAL:
                        Tcl_Panic ("Virtual constrain in MIXED or CHOICE");

                        
                    }
                    if (!mayskip && mayMiss (candidate->quants[i]))
                        mayskip = 1;
                }
                break;

912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940

941
942





943
944
945
946

947
948
949
950
951
952
953
            }
            ac++;
            hm = 0;
        }
        if (isName) {
            if (recover (interp, sdata, "UNEXPECTED_ELEMENT", 15)) {
                /* Skip the just opened element tag and the following
                 * content of it. */
                sdata->skipDeep = 1;
                return 1;
            }
            return 0;
        }
        return -1;

    case SCHEMA_CTYPE_VIRTUAL:
    case SCHEMA_CTYPE_CHOICE:
    case SCHEMA_CTYPE_TEXT:
    case SCHEMA_CTYPE_ANY:
        /* Never pushed onto stack */
        Tcl_Panic ("Invalid CTYPE onto the validation stack!");

    case SCHEMA_CTYPE_INTERLEAVE:
        mayskip = 1;
        for (i = 0; i < cp->nc; i++) {
            icp = cp->content[i];
            if (se->interleaveState[i]) {
                if (maxOne (cp->quants[i])) continue;
            }

            switch (icp->type) {
            case SCHEMA_CTYPE_TEXT:





                break;

            case SCHEMA_CTYPE_ANY:
                sdata->skipDeep = 1;

                se->hasMatched = 1;
                se->interleaveState[i] = 1;
                return 1;

            case SCHEMA_CTYPE_NAME:
                if (icp->name == name
                    && icp->namespace == namespace) {







|
|
















<



>


>
>
>
>
>




>







911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935

936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
            }
            ac++;
            hm = 0;
        }
        if (isName) {
            if (recover (interp, sdata, "UNEXPECTED_ELEMENT", 15)) {
                /* Skip the just opened element tag and the following
                 * content of the current. */
                sdata->skipDeep = 2;
                return 1;
            }
            return 0;
        }
        return -1;

    case SCHEMA_CTYPE_VIRTUAL:
    case SCHEMA_CTYPE_CHOICE:
    case SCHEMA_CTYPE_TEXT:
    case SCHEMA_CTYPE_ANY:
        /* Never pushed onto stack */
        Tcl_Panic ("Invalid CTYPE onto the validation stack!");

    case SCHEMA_CTYPE_INTERLEAVE:
        mayskip = 1;
        for (i = 0; i < cp->nc; i++) {

            if (se->interleaveState[i]) {
                if (maxOne (cp->quants[i])) continue;
            }
            icp = cp->content[i];
            switch (icp->type) {
            case SCHEMA_CTYPE_TEXT:
                if (icp->nc) {
                    if (!checkText (interp, icp, "")) {
                        mayskip = 0;
                    }
                }
                break;

            case SCHEMA_CTYPE_ANY:
                sdata->skipDeep = 1;
                if (mayskip && minOne (cp->quants[i])) mayskip = 0;
                se->hasMatched = 1;
                se->interleaveState[i] = 1;
                return 1;

            case SCHEMA_CTYPE_NAME:
                if (icp->name == name
                    && icp->namespace == namespace) {
967
968
969
970
971
972
973
974
975
976
977

978
979
980
981
982
983



984
985

986
987
988
989
990
991
992
993
                rc = matchElementStart (interp, sdata, name, namespace);
                if (rc == 1) {
                    se->hasMatched = 1;
                    se->interleaveState[i] = 1;
                    return 1;
                }
                popStack (sdata);
                if (!mayskip && rc == -1) mayskip = 1;
                break;

            case SCHEMA_CTYPE_VIRTUAL:

                break;
            }

        }
                
        break;



    }
    

    return 0;
}

int
probeElement (
    Tcl_Interp *interp,
    SchemaData *sdata,
    const char *name,







|



>





|
>
>
>
|
|
>
|







972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
                rc = matchElementStart (interp, sdata, name, namespace);
                if (rc == 1) {
                    se->hasMatched = 1;
                    se->interleaveState[i] = 1;
                    return 1;
                }
                popStack (sdata);
                if (mayskip && rc != -1) mayskip = 0;
                break;

            case SCHEMA_CTYPE_VIRTUAL:
                Tcl_Panic ("Virtual constraint child of INTERLEAVE");
                break;
            }

        }
                
        if (mayskip) break;
        if (recover (interp, sdata, S("UNCOMPLET_CP"))) {
            sdata->skipDeep = 2;
            return 1;
        }
    }
    
    return -1;
}

int
probeElement (
    Tcl_Interp *interp,
    SchemaData *sdata,
    const char *name,
1068
1069
1070
1071
1072
1073
1074




1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088





1089
1090
1091
1092
1093
1094
1095
    }

    if (sdata->stack) {
        SchemaValidationStack *se;
        se = sdata->stack;
        if (se->pattern->type == SCHEMA_CTYPE_NAME
            && se->activeChild >= se->pattern->nc) {




            SetResult ("Unexpected child element \"");
            if (namespacePtr) {
                Tcl_AppendResult (interp, namespacePtr, ":", NULL);
            }
            Tcl_AppendResult (interp, name, "\" for element \"", NULL);
            if (se->pattern->namespace) {
                Tcl_AppendResult (interp, namespace, ":", NULL);
            }
            Tcl_AppendResult (interp, name, "\"", NULL);
            return TCL_ERROR;
        }
    } else {
        if (!pattern) {
            SetResult ("Unknown element");





            return TCL_ERROR;
        }
        pushToStack (sdata, pattern);
        sdata->validationState = VALIDATION_STARTED;
        return TCL_OK;
    }








>
>
>
>














>
>
>
>
>







1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
    }

    if (sdata->stack) {
        SchemaValidationStack *se;
        se = sdata->stack;
        if (se->pattern->type == SCHEMA_CTYPE_NAME
            && se->activeChild >= se->pattern->nc) {
            if (recover (interp, sdata, S("UNEXPECTED_ELEMENT"))) {
                sdata->skipDeep = 1;
                return TCL_OK;
            }
            SetResult ("Unexpected child element \"");
            if (namespacePtr) {
                Tcl_AppendResult (interp, namespacePtr, ":", NULL);
            }
            Tcl_AppendResult (interp, name, "\" for element \"", NULL);
            if (se->pattern->namespace) {
                Tcl_AppendResult (interp, namespace, ":", NULL);
            }
            Tcl_AppendResult (interp, name, "\"", NULL);
            return TCL_ERROR;
        }
    } else {
        if (!pattern) {
            SetResult ("Unknown element");
            if (recover (interp, sdata, S("UNKNOWN_DOKUMENT_ELEMENT"))) {
                sdata->skipDeep = 1;
                return TCL_OK;
            }
            
            return TCL_ERROR;
        }
        pushToStack (sdata, pattern);
        sdata->validationState = VALIDATION_STARTED;
        return TCL_OK;
    }

1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
                        if (checkElementEnd (interp, sdata)) {
                            mayMiss = 1;
                        }
                        popStack (sdata);
                        break;
                        
                    case SCHEMA_CTYPE_VIRTUAL:
                        if (evalVirtual (interp, sdata, ic)) break;
                        else return 0;
                    }
                    if (mayMiss) break;
                }
                if (mayMiss) break;
                return 0;

            case SCHEMA_CTYPE_VIRTUAL:







|
|







1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
                        if (checkElementEnd (interp, sdata)) {
                            mayMiss = 1;
                        }
                        popStack (sdata);
                        break;
                        
                    case SCHEMA_CTYPE_VIRTUAL:
                        Tcl_Panic ("Virtual constrain in MIXED or CHOICE");
                        
                    }
                    if (mayMiss) break;
                }
                if (mayMiss) break;
                return 0;

            case SCHEMA_CTYPE_VIRTUAL:
1729
1730
1731
1732
1733
1734
1735

1736
1737
1738
1739
1740

1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
startElement(
    void         *userData,
    const char   *name,
    const char  **atts
)
{
    ValidateMethodData *vdata = (ValidateMethodData *) userData;

    char *namespace;
    const char *s;
    int i = 0;

    DBG(fprintf (stderr, "startElement: '%s'\n", name);)

    if (Tcl_DStringLength (vdata->cdata)
        || (vdata->sdata->stack
            && (vdata->sdata->stack->pattern->flags & CONSTRAINT_TEXT_CHILD))) {
        if (probeText (vdata->interp, vdata->sdata,
                       Tcl_DStringValue (vdata->cdata)) != TCL_OK) {
            vdata->sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
            Tcl_DStringSetLength (vdata->cdata, 0);
            vdata->onlyWhiteSpace = 1;
            return;
        }
        Tcl_DStringSetLength (vdata->cdata, 0);
        vdata->onlyWhiteSpace = 1;







>





>

<
|
|

|







1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762

1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
startElement(
    void         *userData,
    const char   *name,
    const char  **atts
)
{
    ValidateMethodData *vdata = (ValidateMethodData *) userData;
    SchemaData *sdata;
    char *namespace;
    const char *s;
    int i = 0;

    DBG(fprintf (stderr, "startElement: '%s'\n", name);)
    sdata = vdata->sdata;
    if (Tcl_DStringLength (vdata->cdata)

        || (sdata->stack->pattern->flags & CONSTRAINT_TEXT_CHILD)) {
        if (probeText (vdata->interp, sdata,
                       Tcl_DStringValue (vdata->cdata)) != TCL_OK) {
            sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
            Tcl_DStringSetLength (vdata->cdata, 0);
            vdata->onlyWhiteSpace = 1;
            return;
        }
        Tcl_DStringSetLength (vdata->cdata, 0);
        vdata->onlyWhiteSpace = 1;
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780

1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796

1797
1798

1799
1800
1801

1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
            vdata->uri[i] = '\0';
            namespace = vdata->uri;
        }
    } else {
        s = name;
    }

    if (probeElement (vdata->interp, vdata->sdata, s, namespace)
        != TCL_OK) {
        vdata->sdata->validationState = VALIDATION_ERROR;
        XML_StopParser (vdata->parser, 0);
        return;
    }

    if (atts[0] || vdata->sdata->stack->pattern->attrs) {
        if (probeAttributes (vdata->interp, vdata->sdata, atts)
            != TCL_OK) {
            vdata->sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
        }
    }
}

static void
endElement (
    void        *userData,
    const char  *name
)
{
    ValidateMethodData *vdata = (ValidateMethodData *) userData;


    DBG(fprintf (stderr, "endElement: '%s'\n", name);)

    if (vdata->sdata->validationState == VALIDATION_ERROR) {
        return;
    }

    if (Tcl_DStringLength (vdata->cdata)
        || vdata->sdata->stack->pattern->flags & CONSTRAINT_TEXT_CHILD) {
        if (probeText (vdata->interp, vdata->sdata,
                       Tcl_DStringValue (vdata->cdata)) != TCL_OK) {
            vdata->sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
            Tcl_DStringSetLength (vdata->cdata, 0);
            vdata->onlyWhiteSpace = 1;
            return;
        }
        Tcl_DStringSetLength (vdata->cdata, 0);
        vdata->onlyWhiteSpace = 1;
    }
    if (probeElementEnd (vdata->interp, vdata->sdata)
        != TCL_OK) {
        vdata->sdata->validationState = VALIDATION_ERROR;
        XML_StopParser (vdata->parser, 0);
    }
}

static void
characterDataHandler (
    void        *userData,







|

|



>
|
|

|












>
|

>
|


>
|
|
|

|








|

|







1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
            vdata->uri[i] = '\0';
            namespace = vdata->uri;
        }
    } else {
        s = name;
    }

    if (probeElement (vdata->interp, sdata, s, namespace)
        != TCL_OK) {
        sdata->validationState = VALIDATION_ERROR;
        XML_StopParser (vdata->parser, 0);
        return;
    }
    if (atts[0] || (sdata->stack
                    && sdata->stack->pattern->attrs)) {
        if (probeAttributes (vdata->interp, sdata, atts)
            != TCL_OK) {
            sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
        }
    }
}

static void
endElement (
    void        *userData,
    const char  *name
)
{
    ValidateMethodData *vdata = (ValidateMethodData *) userData;
    SchemaData *sdata;
    
    DBG(fprintf (stderr, "endElement: '%s'\n", name);)
    sdata = vdata->sdata;
    if (sdata->validationState == VALIDATION_ERROR) {
        return;
    }
    if (!sdata->skipDeep &&
        (Tcl_DStringLength (vdata->cdata)
         || sdata->stack->pattern->flags & CONSTRAINT_TEXT_CHILD)) {
        if (probeText (vdata->interp, sdata,
                       Tcl_DStringValue (vdata->cdata)) != TCL_OK) {
            sdata->validationState = VALIDATION_ERROR;
            XML_StopParser (vdata->parser, 0);
            Tcl_DStringSetLength (vdata->cdata, 0);
            vdata->onlyWhiteSpace = 1;
            return;
        }
        Tcl_DStringSetLength (vdata->cdata, 0);
        vdata->onlyWhiteSpace = 1;
    }
    if (probeElementEnd (vdata->interp, sdata)
        != TCL_OK) {
        sdata->validationState = VALIDATION_ERROR;
        XML_StopParser (vdata->parser, 0);
    }
}

static void
characterDataHandler (
    void        *userData,
Changes to tests/schema.test.
3795
3796
3797
3798
3799
3800
3801
3802












































3803





























3804

3805
3806
3807
3808
3809
            element a
            element b
            element c
        }
    }
    s reportcmd schema-18
    set result ""
    set rc [s validate {<doc><a/><c/></doc>} errMsg]












































    puts $errMsg





























    lappend result $rc

    s delete
    set result
} {s MISSING_CP 1}

}







|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
>





3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
            element a
            element b
            element c
        }
    }
    s reportcmd schema-18
    set result ""
    set rc [s validate {<doc><a/><c/></doc>}]
    lappend result $rc
    s delete
    set result
} {s MISSING_CP 1}

test schema-18.3 {reportcmd} {knownBug} {
    tdom::schema s
    s define {
        defelement doc {
            element a
        }
    }
    s reportcmd schema-18
    set result ""
    foreach xml {
        <foo/>
        <bar>baz</bar>
        <bar><a><b/></a></bar>
    } {
        set rc [s validate $xml]
        lappend result $rc
    }
    s delete
    set result
} {s MISSING_CP 1}

proc validatedSAX {g xml {keepEmpties 1}} {
    set args [list -validateCmd $g]
    if {!$keepEmpties} {
        lappend args -ignorewhitespace 1
    }
    xml::parser p {*}$args
    set rc [catch {p parse $xml} errMsg]
    p delete
    return $rc
}

proc validatedDOM {g xml {keepEmpties 0}} {
    set args [list -validateCmd $g]
    if {$keepEmpties} {
        lappend args -keepEmpties
    }
    set rc [catch {
        set doc [dom parse {*}$args $xml]
    } errMsg]
    if {$doc ne ""} {
        $doc delete
    }
    return $rc
}

proc postValidation {g xml} {
    set doc [dom parse $xml]
    set rc [$g domvalidate $doc errMsg]
    $doc delete
    return $rc
}


test schema-18.4 {reportcmd} {knownBug} {
    tdom::schema s
    s define {
        defelement doc {
            element a
        }
    }
    s reportcmd schema-18
    set result ""
    foreach xml {
        <foo/>
        <bar>baz</bar>
        <bar><a><b/></a></bar>
    } {
        set rc [s validate $xml]
        lappend result $rc
    }
    s delete
    set result
} {s MISSING_CP 1}

}