views:

460

answers:

2

Below is an excerpt from an .svg file (which is xml):

   <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>

I'm learning Python and have no clue how can I find all such text elements that have an id field equal to libcode-XX where XX is a number.

I've loaded this .svg file using minidom's parser and tried to find elements using getElementById. However I'm getting None result.

    svgTemplate = minidom.parse(svgFile)
    print svgTemplate
    print svgTemplate.getElementById('libcode-00')

Going after other SO question I've tried using setIdAttribute('id') on svgTemplate object with no luck.

Bottom line: please give a hint for a smart way to extract all of these text elements that have ids in form of libcode-XX. After that it should be no problem to get tspan text and substitute it with generated content.

A: 

Sorry, I don't know my way around minidom. Also, I had to find the namespace declarations from a sample svg document so that your excerpt could load.

I personally use lxml.etree. I'd recommend that you use XPATH for addressing parts of your XML document. It's pretty powerful and there's help here on SO if you're struggling.

There are lots of answers on SO about XPATH and etree. I've written several.

from lxml import etree
data = """
 <svg
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns="http://www.w3.org/2000/svg"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
    xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
    width="50"
    height="25"
    id="svg2"
    sodipodi:version="0.32"
    inkscape:version="0.45.1"
    version="1.0"
    sodipodi:docbase="/home/tcooksey/Projects/qt-4.4/demos/embedded/embeddedsvgviewer/files"
    sodipodi:docname="v-slider-handle.svg"
    inkscape:output_extension="org.inkscape.output.svg.inkscape">
    <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>
    </svg>
"""

nsmap = {
    'sodipodi': 'http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd',
    'cc': 'http://web.resource.org/cc/',
    'svg': 'http://www.w3.org/2000/svg',
    'dc': 'http://purl.org/dc/elements/1.1/',
    'xlink': 'http://www.w3.org/1999/xlink',
    'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'inkscape': 'http://www.inkscape.org/namespaces/inkscape'
    }


data = etree.XML(data)

# All svg text elements
>>> data.xpath('//svg:text',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# All svg text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# TSPAN child elements of text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]/svg:tspan',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}tspan at b7cfc964>]
# All text elements with id starting with "libcode"
>>> data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfcc34>]
# Iterate text elements, access tspan child
>>> for elem in data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap):
...     tp = elem.xpath('./svg:tspan',namespaces=nsmap)[0]
...     tp.text = "new text"

open("newfile.svg","w").write(etree.tostring(data))
MattH
How I can substitute the text in tspan element (tp = "newtext") and then export modified XML to new SVG file using this method? Minidom has toxml().
Marcin Gil
You can modify the etree elements directly. You can export to a new file using `etree.tostring(xmldata)`, as seen in updated example above.
MattH
Is it necessary to prepare nsmap? I see it is used in all expressions.
Marcin Gil
The files you are using use multiple namespaces, with tagged elements and attributes. You may end up with an invalid document if you do not respect these when making changes. You can make XPATH expressions using the `local-name()` function to be namespace agnostic, but then you have all the complications of namespace agnostic expressions and I've not much idea what your real files are going to look like.
MattH
A: 

Does it work if you replace 'id' with 'xml:id'?

If minidom doesn't know svg it might treat the 'id' attribute as just any other attribute, instead of being of type ID. A conforming svg implementation would recognize the 'id' attribute in svg content as being of type ID, and an xml implementation that loads external DTDs should also recognize it correctly if the file is tagged appropriately. Loading external DTDs is optional in XML, so the proper way of fixing this would be to make the parser svg-aware.

Definition of 'id' in SVG 1.1 DTD: http://www.w3.org/TR/SVG11/svgdtd.html#DTD.1.4

Erik Dahlström