views:

130

answers:

1

I am trying to create an automatic feed generation for data to be sent to Google Base using utf-8 encoding. However I am getting errors whenever hyphens are found telling me that there is an encoding error in the relevant attribute (title, description, product_type). I am currently using:

−

but I have also tried:

−

neither of which have worked. I am using the following declaration at the top of the document:

<?xml version="1.0" encoding="utf-8"?>

Ok to give further context to this the data is being pulled from our site's product information stored as utf-8 encoded data in a MYSQL database. The data is going into an RSS 2.0 feed, using the some standard RSS attributes as well as some custom defined Google attributes. The problem comes up whenever there is a hyphen in any field except the link field. So it is appearing in the title and description fields as well as the custom product_type field. Below is an example of a field that Google Base (merchant centre) throws an error over. It throws the same error with or without the other entities and only stops objecting when hyphens are removed.

    <description>&lt;p&gt;Your sports floor is designed primarily for sports use. Thou many facilities have to be used for other activities including things like; assemblies careers fairs drama parties and social events bring and buy sales exhibitions etc.&lt;/p&gt;

&lt;p&gt;Solid hardwood sports floors are designated as &quot;area elastic floors&quot; to provide the spring resilience and shock absorbing qualities needed for sports and dance use to minimise injury. If the floor is too hard the athlete and user will be exposed to early fatigue and aching joints through to injury such as sprains joint and shin bone damage.&lt;/p&gt;

&lt;p&gt;If too soft then ball bounce and running characteristics are compromised.
In the UK hardwood sports floors are governed by a number of recognised standards&lt;/p&gt;

&lt;p&gt;All sports floors must conform to BS7044 Part 4 - this is the minimum Sport England requirement with which your floor msut comply if it is part of a Sport England sponsored project.&lt;/p&gt;

&lt;p&gt;A higher more demanding standard for better quality sports and dance flooring is DIN 18032 Part 2&lt;/p&gt;

&lt;p&gt;The newest - and the best - standard is the European Standard CEN 217. This standard has brought together all the best eprformance criteria from a number of current standards in the EU including BS and DIN.&lt;/p&gt;

&lt;p&gt;All Junckers systems fully comply with one or more of these standards. They ALL comply with the minimum Sport England requirement of BS7044 Part 4 compliance.&lt;/p&gt;</description>
A: 

You talk about using hyphens, but the character you're trying to insert is the mathematical minus sign. Have you tried it with an actual hyphen? And not a HTML entity, either; just the character, -.

Alan Moore
Sorry, to clarify I also tried ­ and – And yes, the original data just had the character and no html entity and that's where the problem started. The data came from a utf-8 source as well.
Yes, I would expect a soft-hyphen or an en-dash to cause problems, too, if a minus sign didn't work. But did you try replacing whatever's there with a plain old dash (i.e., the character to the right of the zero on most English-language keyboards)?
Alan Moore
Yes, that's what it was originally but I tried replacing the character that was displayed with that as well anyway just to see and I still got the same problem.
Well, I don't see how an ASCII hyphen could cause an encoding-related error. How exactly is it being used? Please edit your question and add a little more context.
Alan Moore
Updated with some more contextual information, hopefully that helps explain it a little better!Thanks
Okay, I'm stumped. That's pure ASCII; there's nothing that could cause a problem even if it were incorrectly encoded as, say, windows-1252. Could something be changing the hyphens to something else before GBase sees them? Have *you* tried replacing them with `-` (the numerical entity for the ASCII hyphen)?
Alan Moore
Sorry for the very late response but that seems to have worked just with that...