Page 1 of 1

Removing xml tag and text not working

Posted: Mon Nov 21, 2011 4:13 am
by Legend Graphics
Newbie here. I want to search and replace this (exact) text with a single wordspace...

<span style="xfa-spacerun:yes">  </span>

When I run the filter on the Clipboard text, it works. When I run it on the real file, it doesn't.

Do I need to escape something?

Re: Removing xml tag and text not working

Posted: Mon Nov 21, 2011 7:43 am
by Legend Graphics
I made a mistake. The filter works in the trial run. I haven't tried the clipboard.

Re: Removing xml tag and text not working

Posted: Mon Nov 21, 2011 10:12 am
by DataMystic Support
Ok - glad it is all sorted out!

Re: Removing xml tag and text not working

Posted: Tue Nov 22, 2011 1:21 pm
by Legend Graphics
Follow up....

<span style="xfa-spacerun:yes"> </span>

I realize experienced hands will not think this is an earth shattering discovery. My filter was searching for the exact string above, but. I couldn't get the filter to find the text in the file. Then, I noticed the space between > < didn't have a space dot. When I cut and pasted the "space" from my text file into the blank space in the filter, the filter worked. I then discovered the hex dump feature. The space turned out to be hex 160, a blank character that is similar in width to a normal space.

So the search string that works is: <span style="xfa-spacerun:yes">\160</span>

Re: Removing xml tag and text not working

Posted: Tue Nov 22, 2011 3:01 pm
by DataMystic Support
A better solution would have been to use:

Code: Select all

<span style="xfa-spacerun:yes">([^<]*)</span>
which will match any string. If you want to find out exactly what it matched, turn on prompt on replace, and then select the second tab of the Prompt on Replace dialog, which shows you a hex dump of matched expressions and much more.

Re: Removing xml tag and text not working

Posted: Wed Nov 23, 2011 2:16 am
by Legend Graphics
Simon,

I copied and pasted your suggested code ([^<]*) into my filter and it does nothing on my system. You can check this. The line below has a hex 160 character between the > <. (Hopefully, it will survive posting...probably will because this is not an exotic character.)

<span style="xfa-spacerun:yes"> </span>

I wanted to test your better code to see if it would remove all instances of "<span style="xfa-spacerun:yes"> </span>"...no matter how many spaces occur between > <. I only want to remove lines that have one space, and retain the ones with two or more spaces.

My "\160" does this...apparently with less code. Plus, I figured out this grunt solution in about 15 minutes once I realized the problem. Methinks it would have taken me much longer to fully comprehend TP pattern theory and come up with "([^<]*)". Plus
I can, in about ten seconds, explain \160 to my wife who occasionally, when composing, uses Alt 0160 to add in these fixed spaces.

Some background. When pre-existing Acrobat documents are imported into Adobe LiveCycle Designer, all the text lines in a paragraph (for example) come in as separate lines, not as a united paragraph. To reduce the amount of xml code required, and to expedite future editing, we can merge these separate lines into a real paragraph. When LC does this, it always adds this span at the end of every text item. This causes the resulting text to be double-spaced in places where the text merged. We can have hundreds of these "spans" in one xml file. Visually reviewing text for double-spaces to correct is very tedious and prone to error. Removing spans that have only one space remedies this problem. Sometimes a composer may actually letter space out some text to position it better. Usually (not always) they will use more than one space. This creates a "span" in the xml. We want to leave those instances alone.

It's been about a week since I acquired TP. I will perservere, and learn TP pattern theory for use when I come up with issues that do require more coding elegance.

Re: Removing xml tag and text not working

Posted: Wed Nov 23, 2011 7:45 am
by DataMystic Support
I should have mentioned to set the search type to 'Perl pattern'. For one and only character, this would be:

Code: Select all

<span style="xfa-spacerun:yes">([^<])</span>
and for just \160 (which is hex A0)

Code: Select all

<span style="xfa-spacerun:yes">\xA0</span>

As an EasyPattern search/replace, this becomes:

Code: Select all

<span style="xfa-spacerun:yes">[ capture( 1 not '<') ]</span>
The capture is only necessary if you want to easily see what exotic character matched.

Re: Removing xml tag and text not working

Posted: Wed Nov 23, 2011 8:29 am
by Legend Graphics
Ah, DECIMAL 160, not Hex 160. Anyway... "\160", using Find Type "exact" does the job I want to do.

Re: Removing xml tag and text not working

Posted: Wed Nov 23, 2011 9:53 am
by DataMystic Support
Exact uses a different matching algorithm, where \160 matches decimal. This is from TextPipe ancient history....