Next Generation TypoScript Parser¶
Classification: | tsp |
---|---|
Version: | 1.0.1 |
Language: | en |
Description: | Next Generation TypoScript Parser (tsp) |
Keywords: | typoscript, performance, parser |
Copyright: | 2016 |
Author: | Elmar Hinz |
Email: | t3elmar@gmail.com |
License: | This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml |
Rendered: | June 07, 2016 |
The content of this document is related to TYPO3, a GNU/GPL CMS/Framework available from www.typo3.org.
Table of Contents
Introduction¶
This extensions ships a TypoScript parser, that is suited to replace the original TypoScript parser for frontend rendering. In fact a family of parsers has been introduced, specialized on different tasks.
- FE: TypoScriptConditionsProcessor
- FE: TypoScriptProductionParser
- BE: TypoScriptSyntaxParser
What it is not¶
No Boost in Performance¶
The parsing of TypoScript just takes a few milliseconds. Hence, it’s not the primary goal to speed up the performance but to improve the architecture. The algorithm is twice as fast as the original algorithm, but with the split into conditions proprocessor and processor the time is about the same again.
What it is¶
Standalone Usage¶
It’s possible to use the TypoScript parser standalone outside of the TYPO3 CMS if you like the TypoScript syntax and want to use it for configuration in other fields. This is possible with or without the conditions preprocessor.
Improving the error detection¶
The error detection covers the error detection of the origional parser and tries be be a little better already. Also the displaying of the line numbers has been worked upon. See Screenshots!
Having done this prove of concept that the replacement of the original syntax highlighter can be done further debugging features are planned:
- Do syntax highlighting of conditions, instead of printing them in one color.
- Detect the difference of objects and properties, because only objects are allowed ot be copied by reference.
- (Related) Throw verbose errors from TS objects, catch them and and display them into the backend.
New Architecture¶
The reason to write a new TypoScript parser is, to get a modern architecture for it:
- easy to understand
- easy to debug
- easy to extend
A modern parser makes it more easy to get rid of flaws in TypoScript and to add new features like if-else conditions, that work the way you are used to from other languages or enhance error debugging.
Condition Preprocessor¶
Condition evaluation is done by a preprocessor. By separtion of the condition preprocessing it becomes possible to use the TypoScript parser without bothering with conditions and focus on it’s task.
On the other hand by isolating the conditions it becomes possible to enhance the condition preprocessing easily. For example it becomes easy to introduce an [ELSEIF] element.
As with the old parser the condition matching is handled by a third object. Exchanging this object enables the development of conditions, that address a completly different field than the TYPO3 CMS.
Public Presentation¶
This is a public presentation of the parser. Should it replace the old parser of the core? If yes, it needs to be tested in the wild before until it is really stable. This is the extension to do so.
Differences¶
- Backslash doesn’t escape anything.
- Escaping of dots in object keys is not supported.
- Backslash is an allowed character in the keys (for PHP namespaces).
Administration¶
Install the extension, clear caches and check of your frontend is rendered as expected and if you get the advanced error feedback in the backend.
If anything goes wrong, uninstall and report the issue.
https://github.com/elmar-hinz/TX.tsp/issues
Technical Implementation¶
The origional parser is not fully replaced but extended by XCLASS registration. The extended class serves as adapter to the standalone classes.
Research¶
CoreTypoScriptParserTyposcriptParser¶
Overview¶
The method parse()
is a preprocessor that handels including and
excluding of template parts by condtions.
It doesn’t parse the incoming lines to end first, but delegates the parts
immediately to parseSub()
(a kind of depth-first parsing of the template
tree).
The method doSyntaxHighlight()
is responsible to generate a syntax
highlighted HTML
string. It also calls the preprocessor parse()
but
sets a flag, that disables the coditions, so that all parts are evaluated.
The latter is strange in two aspects. It doesn’t make sense to send syntax
highlighting through a conditioning preprocessor. It doesn’t make sense to
parse into an array tree, when one actually want’s a HTML
string as result.
Conditions¶
Inn the method parse()
the template is branched into rendered and
non-rendered parts based on conditions. The condition evalutation is delegated
to a $matchObj
that is injected by parameter.
For each condition the method creates a hash and stores it into
$this->sections
array. This are used by the TemplateService
to cache
the rendered templates matching combinations of conditions that evaluate to
true.
Line numbering¶
There is a line number offset that sums up the line numbers of previously
rendered templates. It is advanced at end of parse()
.
The line numbers of the current template are tracked by $this->rawP
in the
main loop of parseSub()
and also for the condition sections that evaluate
to false in the method nextDivider()
. $this->rawP
is reset to zero at
the beginning of the rendering of the current template in the method
parse()
.
Error handling¶
method error($errorString, $severity = 2)
.
This method collects into $this->errors[] = [a, b, c, d] with:
- a = error message
- b = severity
- c = line number
- d = template line number offset
Collected messages:
- ‘Script is short of XXX braces.’
- ‘An end brace is in excess.’
- ‘On return to [GLOBAL] scope, the script was short of XXX braces.’
- ‘A multiline value section is not ended with a parenthesis!’
- ‘Object Name String, contains invalid character XXX. Must be alphanumeric or one of: “_:-.”.’
- ‘Object Name String XXX was not followed by any operator, =<>({‘
- ‘### ERROR: XXX’ (Error to be extract form an error comment created in previous parsing steps like during template includes.)
Syntax highlighting¶
Highlighted parsing is controlled by the method doSyntaxHighlight()
.
It sets the flag $this->syntaxHighLight
to true and the template string is
parsed. The flag activates the additional highlighting functionality during the
process of parsing. Finally the method syntaxHighlight_print()
is called to
format the collected results including the error messages.
Registration of highlighted parts of lines is done during parsing by the method
regHighLight()
if the above flag is set. The parts are collected into
$this->highLightData
$this->highLightData_bracelevel
Both arrays count per line, the first one the higlighted sections of the line, the second one the depth of brace nesting.
Breakpoints¶
A breakpoint is a line number in $this->breakPointLN
to break the
execution of the rendering. The method parseSub()
returns with a marker
[_BREAK]
. This marker stops the further execution of the main loop
in parse()
.
TemplateService¶
TemplateService
is a service that makes use of the parser. A main task of
TemplateService is, to cache the rendered template for different combinations
of conditions of a page.
ExtendedTemplateService¶
The class ExtendedTemplateService
contains method for the TS module in TYPO3
backend. It extends TemplateService.
Lessons Learned¶
The overall time to parse the TypoScript of a website takes just a few milliseconds. It is not a critical part of the overall page rendering time. Yet the development of this extension was also focused on performance.
Time to parse the templates vs. time to parse TypoScript¶
When measured with the TYPO3 core time tracker (admin panel) the template parsing takes a few hundred milliseconds. When measuring and summing up all calls to the TypoScript parse function (TypoScriptParser::parse()) it takes just a few milliseconds. The difference is most likley to be explained by I/O calls to read the templates.
Non-Recursive Parser¶
The Non-Recursive Parser
is the approach taken by this parser. The whole
rendering happens within one function by using simple loop structures. Calls to
itself or other methods are avoided as far as reasonable. This turns out to be
twice as fast as the recursive Original TypoScript Parser
.
Original TypoScript Parser¶
The original parser of the TYPO3 core uses recursive calls to handle the nesting of the braces of the object name pathes.
JSON Parser¶
The idea of the JSON Parser
was, to use the PHP function json_decode
to
create the large TypoScript
tree consisting of hundreds of PHP arrays on
the binary level. TypoScript
was rewritten to a valid JSON
string as
input.
Unfortunately json_decode
does merging but not recursive merging. As
overwriting is a feature of TypoScript
this requires to prepare the
JSON
rendering by any approach to do the overwriting in advance. An array
was created, containing the full object path as key and the value as value to
solve this. Although this creates no nested tree, it takes time.
Together with the conversion to a JSON
string in the second step, there is
no advantage in speed. Taking the non-recursive approach to handle the two
steps, it ends up in a similar speed as the Original TypoScript Parser
.
Known Issues¶
No Exceptions are Thrown¶
The TypoScript production parser currently doesn’t throw execptions. It expects valid TS as input. The syntax higlighting parser is designed to inspect TS for mistakes.
The original parser doesn’t throw exceptions either. Modules of the backend are not prepared to catch exeptions from the parser and break if execeptions would be thrown from insane TS.
Intolerant for Insane TS¶
The TypoScript production parser will silently break, if feed with insane TS. It is optimized for speed and is less tolerant for insane TS than the origional parser.
This means in rare cases code that works for the original parser may break with the TypoScript production parser. Use the syntax highlighting parser to fix the TS code.