Tips to Ensure Your XML RSS Feed is Valid
Last month, I wrote on article on creating an RSS feed for your site. Some people have reported problems with the process – but these all come from malformed XML, not the PHP code supplied in the article.
There are a few things you need to keep in mind while writing an XML document, or else the RSS reader will not be able to understand it. It’s better to think of writing XML as writing code than text, because there are some strict syntax rules you need to adhere to.
Balance Opening and Closing Tags
One strict syntax rule of XML is that opening and closing tags must be balanced. For every opening tag, there must be a closing tag.
<item> <description> </item> <item> <description></description> </item>
In this example, the first item element is invalid – because description has no closing tag. The second example is valid, because description is closed. You could also use a self-closing tag if description doesn’t contain a value.
<item> <description /> </item>
With a simple feed, it can be easy enough to keep tags balanced. However, it’s a good idea to look into using an XML parser like SimpleXML to build the feed. This ensures that the tags are balanced and that the XML is well-formed.
Ampersands Are Not Allowed
The lovely ampersand – & – is not allowed to appear in an XML document without being escaped (&).
If a browser or feed reader encounters an ampersand that isn’t escaped and that isn’t being used to form a valid entity, the application will simply stop reading. You might overlook this possible problem if your URLs use query strings – and thus have ampersands inside them.
The solution is to escape all of the URLs and data that is going into your feed. The reader will convert & back into & so that your links work correctly. PHP offers a function to help you do that.
$escapedText = htmlentities($unescapedText);
Pass all of the text that needs to be escaped (like your URLs) through the htmlentities function, and your problems will be solved. Incidentally, this will also escape > and < characters.
No Non-Breaking Spaces!
The final tip deals with invalid entities, like . As mentioned before, ampersands are a strict no-no. They can be used to form entity codes, but they cannot be used by themselves.
If you attempt to create an entity that the XML processor doesn’t recognize, it won’t read it as an entity – it will read it as an ampersand with some text following. For example, &bob; would be read by an XML processor as an ampersand and the word “bob;”.
It would then promptly stop processing your feed.
One common way for this to break your feed is the non-breaking space character – . These often serve no purpose in the document, but you might get them littered around your source code if you use a WYSIWYG editor.
You could get rid of these by filtering your output through a function that does a str_replace of with ‘ ‘.
If the non-breaking space character serves a purpose in your document, you can preserve it for the browser but make the XML processor ignore it. This is where CDATA comes in.
<description><![CDATA[This is a description with content that <strong>shouldn't</strong> be interpeted by the XML processor.]]></description>
When you wrap text inside that tag – <![CDATA[ and ]]> – you tell the processor to ignore the text and spit it out as is. This can help your non-breaking space character make it to the user. You may also want to use this if your descriptions and content include HTML tags.
Browse the Spec
These are a couple of tips to help you address some of the more common things that will break your XML documents.
However, there are many other minor errors you can come across. The only way to learn about them and solve them is through two things – research and trial and error.
You may want to take some time to browse the XML specification as well as read forums for problems that other people have. Then, test things out and find out what doesn’t work.
Diana said this on March 15th, 2008 at 6:06 pm
Fortunately, for my Wordpress blogs I use the Google Sitemaps plugin, which solves my problem without any headache. But some time ago, I had to make a program in asp for my b2b portal, which has a lot of pages and it was very helpful for me to use a XML validator from validome (I don’t give the link because Akismet didn’t like this, but you can find it with google).