|
HomeNewsExamplesDemoDownloadsFAQDocumentationMailing ListsLicense | |||||||||
2:37 pm GMT
GeSHi NewsHere's where you can find out all the latest news about GeSHi - new releases, bug fixes and general errata. GeSHi 1.2.X update15/12/2004I've been working on GeSHi pretty hard recently, and have managed to get together a pretty stable context "psuedo parser" engine. It's really quite neat - defining languages now is less about characteristics of the the language, and more about the characteristics of contexts within that language. For example, PHP has the single string, which has a default style, type (for backward compatibility, more on that later), starter and ender, and knows how to highlight itself (by use of a child class of the generic GeSHiContext class). And HTML has the "entity" context - starts with &, finishes with ;, no children, default style (...)... it's going to be an interesting exercise for everyone, designing the trees for various languages. Anyway, the advantages of this method? Well, they are manyfold: 1) No Regexps! This makes the engine up to 10 times faster to begin with - even reasonable sized sources get parsed in well under a second. I haven't even begun to optimise yet, there is still much code that is just "copy and paste"'d from one part to another, and it is parsing code much more quickly and without as much server load as the 1.0.X series. 2) "Perfect" highlighting - in GeSHi 1.0.X a lot of the work of highlighting was done by regular expressions, with really one regular expression in particular looking after the highlighting of keywords. This lead to the possibility that some languages who allowed certain symbols next to keywords not getting their keywords highlighted - even PHP suffered from this, &new is perfectly valid but unparsable by GeSHi, because semi-colons next to keywords were banned for other languages. However, in GeSHi 1.2.X, things will be different - because keywords will already be in context, a str_replace will be all that is needed. Furthermore, for contexts it is possible to define a child class to decide how context should highlight itself - so for example, I have the PHP Double string context highlighting interpolated variables (including class field variables, which are even highlighted a different colour!), and I can specify exactly the escape characters that are to be used - in GeSHi 1.0.X it simply decided anything after an escape character should be highlighted, which is silly for q for example. 3) More control over the highlighting - In GeSHi 1.0.X you only have the ability to do, for example, set_strings_style(), and that would set the style of every string in the source. With GeSHi 1.2.X you will be able to specify an identifier - for example, html/entity, or php/single_string, and the styles will apply only to that part of the source. This point leads on ... 4) Languages within languages! Yes, this is now possible! There's no more need for strict mode, and no more sad looking HTML embedded amongst your PHP - the HTML can be highlighted also. And, of course, the HTML context also has the CSS and Javascript child contexts for example (in fact, in my test HTML context file I've also embedded javascript highlighting inside html/double_strings that start with javascript:!) Furthermore, the highlighting will be a lot "safer" - anyone who has seen the current parse_code method in GeSHi 1.0.X can see that it is quite unsafe - keywords are substituted for placeholders, which only at the end after methods, numbers and the like are parsed get replaced by the actual HTML. Instead in GeSHi 1.2.X, a completely new result string is built based on the code string, making it much less likely that some obscure string in the source code will make the parser bail out. In short, things are looking up! But there is still quite some work to do before I release an alpha for everyone to play with... 1) I haven't written the second most important part yet - the in-context keyword/symbol/number/method parser. This is the part that will take the place of the parse_non_string_part() method in the current GeSHi. Although in GeSHi 1.2.X certain contexts will be able to overrule this code (for example, it makes sense to have strings overrule this, because they have no keywords in them usually and also have to deal with escape characters), this code will be a valid, centralised base for parsing the actual "meat" of the source code - the code that isn't in a string, comment etc. 2) There's only experimental support for PHP at the moment - in fact, the PHP language has a root context of HTML (if this seem strange, remember that PHP code isn't parsed unless it is within <?php ... ?> blocks, and the rest is straight HTML), which has children PHP, CSS, Javascript, HTMLTag, Entity and the like. And as I change and develop things, these contexts will grow, have more added to them, or may even disappear. Basically, the code is to volatile to release. 3) Only a couple of methods from the original GeSHi API are implemented - parse_code and enable_classes. Both are pretty much done (parse_code is neat now - just ask the root context to parse itself ;)), but of course there isn't any of the other methods that people would use so often - get_stylesheet for example. So, lots of work for me to do still! But keep checking back for updates - I'll do another update as soon as I write the in-context parser :) |