Removing xml tag and text not working

Legend Graphics · Post by **Legend Graphics** » Mon Nov 21, 2011 4:13 am

Newbie here. I want to search and replace this (exact) text with a single wordspace...

 

When I run the filter on the Clipboard text, it works. When I run it on the real file, it doesn't.

Do I need to escape something?

Legend Graphics · Post by **Legend Graphics** » Mon Nov 21, 2011 7:43 am

I made a mistake. The filter works in the trial run. I haven't tried the clipboard.

Post by **DataMystic Support** » Mon Nov 21, 2011 10:12 am

Ok - glad it is all sorted out!

Legend Graphics · Post by **Legend Graphics** » Tue Nov 22, 2011 1:21 pm

Follow up....

 

I realize experienced hands will not think this is an earth shattering discovery. My filter was searching for the exact string above, but. I couldn't get the filter to find the text in the file. Then, I noticed the space between > < didn't have a space dot. When I cut and pasted the "space" from my text file into the blank space in the filter, the filter worked. I then discovered the hex dump feature. The space turned out to be hex 160, a blank character that is similar in width to a normal space.

So the search string that works is: \160

Post by **DataMystic Support** » Tue Nov 22, 2011 3:01 pm

A better solution would have been to use:

Code: Select all

<span style="xfa-spacerun:yes">([^<]*)</span>

which will match any string. If you want to find out exactly what it matched, turn on prompt on replace, and then select the second tab of the Prompt on Replace dialog, which shows you a hex dump of matched expressions and much more.

Legend Graphics · Post by **Legend Graphics** » Wed Nov 23, 2011 2:16 am

Simon,

I copied and pasted your suggested code ([^<]*) into my filter and it does nothing on my system. You can check this. The line below has a hex 160 character between the > <. (Hopefully, it will survive posting...probably will because this is not an exotic character.)

 

I wanted to test your better code to see if it would remove all instances of " "...no matter how many spaces occur between > <. I only want to remove lines that have one space, and retain the ones with two or more spaces.

My "\160" does this...apparently with less code. Plus, I figured out this grunt solution in about 15 minutes once I realized the problem. Methinks it would have taken me much longer to fully comprehend TP pattern theory and come up with "([^<]*)". Plus
I can, in about ten seconds, explain \160 to my wife who occasionally, when composing, uses Alt 0160 to add in these fixed spaces.

Some background. When pre-existing Acrobat documents are imported into Adobe LiveCycle Designer, all the text lines in a paragraph (for example) come in as separate lines, not as a united paragraph. To reduce the amount of xml code required, and to expedite future editing, we can merge these separate lines into a real paragraph. When LC does this, it always adds this span at the end of every text item. This causes the resulting text to be double-spaced in places where the text merged. We can have hundreds of these "spans" in one xml file. Visually reviewing text for double-spaces to correct is very tedious and prone to error. Removing spans that have only one space remedies this problem. Sometimes a composer may actually letter space out some text to position it better. Usually (not always) they will use more than one space. This creates a "span" in the xml. We want to leave those instances alone.

It's been about a week since I acquired TP. I will perservere, and learn TP pattern theory for use when I come up with issues that do require more coding elegance.

Post by **DataMystic Support** » Wed Nov 23, 2011 7:45 am

I should have mentioned to set the search type to 'Perl pattern'. For one and only character, this would be:

Code: Select all

<span style="xfa-spacerun:yes">([^<])</span>

and for just \160 (which is hex A0)

Code: Select all

<span style="xfa-spacerun:yes">\xA0</span>

As an EasyPattern search/replace, this becomes:

Code: Select all

<span style="xfa-spacerun:yes">[ capture( 1 not '<') ]</span>

The capture is only necessary if you want to easily see what exotic character matched.

Legend Graphics · Post by **Legend Graphics** » Wed Nov 23, 2011 8:29 am

Ah, DECIMAL 160, not Hex 160. Anyway... "\160", using Find Type "exact" does the job I want to do.

Post by **DataMystic Support** » Wed Nov 23, 2011 9:53 am

Exact uses a different matching algorithm, where \160 matches decimal. This is from TextPipe ancient history....

DataMystic

Removing xml tag and text not working

Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working

Re: Removing xml tag and text not working