Forums | MacLife
You are not logged in.
#1 2009-02-08 1:18 am
xhtml via php DOMDocument (brief howto)
I'm by no means a guru, but I'm been playing with this the past couple of days, and decided to share what I learned.
The DOMDocument class in php is part of the php-xml module. It seems to be fairly bleeding edge, and not 100% implemented yet. php 4 had a different method which might be more mature. Anyway, I'm using php 5.2.5
DOMDocument puts together an xml document. Using it has several advantages IMHO over coding dynamically generated content the traditional way.
First, it completely eliminates the need to use echo and print - until you are completely ready to spit out your completed document. This allows you to wait until the document is completely built before you send the header, so a php error that spits to output doesn't end up with firefox telling you that you have malformed xml, forcing you to look at the source of the output to see where.
Secondly - you don't have to worry about escaping crap or ending php via ?> to do a big block of html that may result in several <?php echo($foo);?> and other stuff. The code looks a hell of a lot cleaner.
Third, and something I love - you don't have to code in order. You simply append stuff to various parent elements as you go, and then append the elements to the document when you are putting it all together. Really helpful, for example, in keeping generated hidden inputs together in a complex dynamic form. Simply appended them to a specific div for hidden inputs, and append the hidden input to the form when you won't have any more hidden inputs.
There are some drawbacks - notably that more lines of code are required, adding stuff (IE JavaScript) to a cdata block is more difficult, and it's easy to forget to append elements.
The process is similar to JavaScript - IE
Code:
$myDiv = $somedocument->createElement('div');
$myDiv->setAttribute("id","someid");
$myDiv->setAttribute("class","someclass");
$myDiv->appendChild($somepreviouslycreatedelement);
$myBody->appendChild($myDiv);When you are done, you can print the output via print $mydocument->saveXML();
However - there's a major problem with that. It seems that the ability to create a document with a specified DTD is not yet implemented. So - here's what I came up with, I call it xhtml.inc and simply require it at the beginning of my php -
Code:
<?php
function sendxhtmlheader($usexml) {
if ($usexml == 1) {
header("Content-Type: application/xhtml+xml; charset=utf-8");
} else {
header("Content-type: text/html; charset=utf-8");
}
}
function sendpage($page,$usexml) {
$xhtmldtd="\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n";
$bar=preg_replace('/\n/',$xhtmldtd,$page,1);
sendxhtmlheader($usexml);
print($bar);
}
if (! isset($usexml)) {
$usexml=1;
}
if ($usexml == 1) {
if (isset( $_SERVER['HTTP_ACCEPT'] )) {
if(! strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
$usexml=0;
}
} else {
$usexml=0;
}
}
$myxhtml = new DOMDocument("1.0","UTF-8");
$myxhtml->preserveWhiteSpace = false;
$myxhtml->formatOutput = true;
$xmlHtml = $myxhtml->createElement("html");
$xmlHtml->setAttribute("xmlns","http://www.w3.org/1999/xhtml");
$xmlHtml->setAttribute("xml:lang","en");
?>That prevents the xhtml+xml header from being sent to Internet Exploder (and other browsers that don't report an ability to handle xhtml+xml) and adds the xhtml 1.1 strict DTD to the document.
In your document, instead of printing the output, assign it to a variable - IE
$foo=$myxhtml->saveXML();
Then send it to the sendpage function -
sendpage($foo,$usexml);
-=-
Some gotchas - you can only append an element to one parent. If you try to append it to a second parent, it no longer is appended to the first parent. If after appending it to the first parent you redefine it from scratch (so that you are using the same variable name but name only), then there's not a problem.
For cdata stuff - you create a cdata object like this:
$xmlCdata = $myxhtml->createCdataSection($string);
You either need to put all your stuff (IE javascript) into a single string with newlines in the right place, or append text nodes to it after the first, but remember to have a newline after anything you want to be a new line.
You append your cdata as a child to it's parent - IE
$xmlScript->appendChild($xmlCdata);
-=-
While you probably can (if you have an id tag or some way to identify it) append stuff to a parent that has already been appended to a parent, it's probably easier not to. Once you know an element has all the children it is going to have, then append it to it's parent.
Thus - the end of your code when you are ready to send will look something like this:
Code:
$xmlHtml->appendChild($xmlHead); $xmlHtml->appendChild($xmlBody); $myxhtml->appendChild($xmlHtml);
It's a really neat way to generate an xhtml document.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#2 2009-02-08 5:33 am
Re: xhtml via php DOMDocument (brief howto)
There's an added bonus to doing it this way - instead XSS security.
Since a script consists of at least one node with one child - if someone tries to sneak one into an input and you don't clean the input, the tags will be encoded so as not to be an actual script node, as the DOMdocument class properly sees it as one node and encodes it as such.
So unless your webapp parses the input to create nodes and subnodes, an XSS injection won't work.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#3 2009-02-08 12:18 pm
Re: xhtml via php DOMDocument (brief howto)
Interesting. Messing with the DOM from PHP seems to ignore the boundary between content and presentation, but interesting nonetheless.
Basseq is me, John Whittet.
(Finishing the remainder of the thought expressed in the post has been left as an exercise for the reader.)
Offline
#4 2009-02-08 12:45 pm
- Booksley
- Zombie Genocidest
- From: Toronto, Ontario
- Registered: 2001-02-16
- Posts: 5031
Re: xhtml via php DOMDocument (brief howto)
Basseq wrote:
Interesting. Messing with the DOM from PHP seems to ignore the boundary between content and presentation, but interesting nonetheless.
Isn't that what PHP is all about? 
Offline
#5 2009-02-08 1:31 pm
Re: xhtml via php DOMDocument (brief howto)
It just seems like the wrong place in the chain to have hardcore DOM manipulation, that's all.
Basseq is me, John Whittet.
(Finishing the remainder of the thought expressed in the post has been left as an exercise for the reader.)
Offline
#6 2009-02-08 3:27 pm
Re: xhtml via php DOMDocument (brief howto)
It creates an xml document. That's all it does - it doesn't ignore boundries between content and presentation.
In fact - I'm also using it to create GPX files (GPX is an xml file format for storing GPS waypoints, tracks, and routes) - there is no presentation with GPX.
While using DOMDocument can not guarantee valid xhtml (or any other specific xml implementation), it does (assuming no bugs in the php class itself) guarantee valid XML - which is just a data storage and exchange method and doesn't have anything to do with presentation.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#7 2009-02-08 5:57 pm
Re: xhtml via php DOMDocument (brief howto)
Fair 'nuff.
Basseq is me, John Whittet.
(Finishing the remainder of the thought expressed in the post has been left as an exercise for the reader.)
Offline
#8 2009-02-12 3:19 pm
Re: xhtml via php DOMDocument (brief howto)
There does seem to be a major drawback, though it is a drawback with xhtml. Ad servers want you to insert javascript, their javascript almost always uses document.write which is invalid with xhtml (strict anyway) and wants to insert an iframe (also invalid with xhtml).
The hack is to use an html object and have the object be standard html that allows the iframe - but that's too hackish for me (iframe inside an object inside the document) - so you either don't send the xhtml header (in which your page technically is not valid) or don't use ad servers that can't properly cope with a standard that has been out for eons.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#9 2009-02-16 8:56 am
Re: xhtml via php DOMDocument (brief howto)
Here's an update of my xhtml.inc file - it translates the document to html 4.01 for browsers that do not support xhtml+xml
Code:
<?php
function HTMLify($buffer) {
/* based on http://www.kilroyjames.co.uk/2008/09/xhtml-to-html-wordpress-plugin */
$xhtml[] = '/type=\"text\/javascript\"\/>/';
$html[] = 'type="text/javascript"></script>';
$xhtml[] = '/\/>/';
$html[] = '>';
$xhtml[] = '/\/\s+>/';
$html[] = '>';
return preg_replace($xhtml, $html, $buffer);
}
function sendxhtmlheader($usexml) {
if ($usexml == 1) {
header("Content-Type: application/xhtml+xml; charset=utf-8");
} else {
header("Content-type: text/html; charset=utf-8");
}
}
function sendcspheader($usejs=0) {
if ($usejs == 1) {
header('X-Content-Security-Policy: allow self');
} else {
header('X-Content-Security-Policy: allow self; script-src none');
}
}
function sendpage($page,$usexml,$usejs=0) {
$xhtmldtd="\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n";
$htmldtd="<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">";
if ($usexml == 0) {
$bar=preg_replace('/<\?xml version=\"1.0\" encoding=\"UTF-8\"\?>/',$htmldtd,$page,1);
$bar = HTMLify($bar);
} else {
$bar=preg_replace('/\n/',$xhtmldtd,$page,1);
}
sendxhtmlheader($usexml);
sendcspheader($usejs);
print($bar);
}
if (! isset($usexml)) {
$usexml=1;
}
if ($usexml == 1) {
if (isset( $_SERVER['HTTP_ACCEPT'] )) {
if(! strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
$usexml=0;
}
} else {
$usexml=0;
}
}
$myxhtml = new DOMDocument("1.0","UTF-8");
$myxhtml->preserveWhiteSpace = false;
$myxhtml->formatOutput = true;
$xmlHtml = $myxhtml->createElement("html");
if ($usexml == 1) {
$xmlHtml->setAttribute("xmlns","http://www.w3.org/1999/xhtml");
$xmlHtml->setAttribute("xml:lang","en");
}
?>//The sendcspheader function is not necessary - it's for a different issue
It's not hard to quit smoking. I do it 20 times a day.
Offline
#10 2009-02-16 10:12 am
Re: xhtml via php DOMDocument (brief howto)
HTMLify is broken for self closing div tags as well.
I'll eventually get it right 
For now, I'm just not using self closing div tags.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#11 2009-02-16 9:39 pm
Re: xhtml via php DOMDocument (brief howto)
Here's the fixed xhtml->html filter -
Code:
function HTMLify($buffer) {
/* based on http://www.kilroyjames.co.uk/2008/09/xhtml-to-html-wordpress-plugin */
$xhtml[] = '/<script([^<]*)\/>/';
$html[] = '<script\\1></script>';
$xhtml[] = '/<div([^<]*)\/>/';
$html[] = '<div\\1></div>';
$xhtml[] = '/<a([^<]*)\/>/';
$html[] = '<a\\1></a>';
$xhtml[] = '/\/>/';
$html[] = '>';
// DOMDocument never produces white space between / and > on self closing tags
// $xhtml[] = '/\/\s+>/';
// $html[] = '>';
return preg_replace($xhtml, $html, $buffer);
}I don't know if there are any other self closing xhtml tags that have to be handled that way or not, but doing the same thing with them should work.
The a is there because <a id="foo"/> is legal xhtml, though last time I tried it - I think firefox 1.5 - firefox mis-rendered it, so when making an anchor in DOMDocument, use $document->createElement("a"," "); to force a text node in the anchor.
Anyway, this new filter seems to work perfectly.
Another issue with the filter though is object tags, you can't just turn object tags into iframe for obvious reasons, I don't know if object tags are legal in html 4.01 or not (and don't care, not right now anyway, I don't use them, not yet anyway)
It's not hard to quit smoking. I do it 20 times a day.
Offline
#12 2009-04-13 2:08 am
Re: xhtml via php DOMDocument (brief howto)
OK - this is a much better way to do it -
Code:
<?php
$xhtmldtd="<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">";
$htmldtd="<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">";
function sendxhtmlheader($usexml) {
if ($usexml == 1) {
header("Content-Type: application/xhtml+xml; charset=utf-8");
} else {
header("Content-type: text/html; charset=utf-8");
}
}
if (! isset($usexml)) {
$usexml=1;
}
if ($usexml == 1) {
if (isset( $_SERVER['HTTP_ACCEPT'] )) {
if(! strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
$usexml=0;
}
} else {
$usexml=0;
}
}
$myxhtml = new DOMDocument("1.0","UTF-8");
$myxhtml->preserveWhiteSpace = false;
$myxhtml->formatOutput = true;
if ($usexml == 0) {
$xmlstring = $htmldtd . "<html></html>";
} else {
$xmlstring = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" . $xhtmldtd . "<html></html>";
}
$myxhtml->loadXML($xmlstring);
$elements = $myxhtml->getElementsByTagName("html");
$xmlHtml = $elements->item(0);
if ($usexml == 1) {
$xmlHtml->setAttribute("xmlns","http://www.w3.org/1999/xhtml");
$xmlHtml->setAttributeNS('http://www.w3.org/XML/1998/namespace','xml:lang','en');
}Then you just add you children to the $xmlHtml object and when all is said and done -
Code:
$xmlHtml->appendChild($xmlHead);
$xmlHtml->appendChild($xmlBody);
sendxhtmlheader($usexml);
if ($usexml == 0) {
print $myxhtml->saveHTML();
} else {
print $myxhtml->saveXML();
}Valid html 4.01 or xhtml 1.1 depending upon the browser (assuming your DOM is valid and you don't use xhtml specific stuff like MathML)
No need for ugly preg_replace crap involving output saved to a buffer first.
Only issue is libxml2 appears to do improper things with some utf8 entities when it saves to html opposed to xml.
It's not hard to quit smoking. I do it 20 times a day.
Offline
#13 2009-04-13 5:17 am
Re: xhtml via php DOMDocument (brief howto)
seems the libxml issue only exists with loadHTML() - if loadXML() is done on a utf8 document then saveHTML() does the right thing.
It's not hard to quit smoking. I do it 20 times a day.
Offline
