Removing xml tag and text not working
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 5
- Joined: Mon Nov 21, 2011 3:39 am
Removing xml tag and text not working
Newbie here. I want to search and replace this (exact) text with a single wordspace...
<span style="xfa-spacerun:yes"> </span>
When I run the filter on the Clipboard text, it works. When I run it on the real file, it doesn't.
Do I need to escape something?
<span style="xfa-spacerun:yes"> </span>
When I run the filter on the Clipboard text, it works. When I run it on the real file, it doesn't.
Do I need to escape something?
-
- Posts: 5
- Joined: Mon Nov 21, 2011 3:39 am
Re: Removing xml tag and text not working
I made a mistake. The filter works in the trial run. I haven't tried the clipboard.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Removing xml tag and text not working
Ok - glad it is all sorted out!
-
- Posts: 5
- Joined: Mon Nov 21, 2011 3:39 am
Re: Removing xml tag and text not working
Follow up....
<span style="xfa-spacerun:yes"> </span>
I realize experienced hands will not think this is an earth shattering discovery. My filter was searching for the exact string above, but. I couldn't get the filter to find the text in the file. Then, I noticed the space between > < didn't have a space dot. When I cut and pasted the "space" from my text file into the blank space in the filter, the filter worked. I then discovered the hex dump feature. The space turned out to be hex 160, a blank character that is similar in width to a normal space.
So the search string that works is: <span style="xfa-spacerun:yes">\160</span>
<span style="xfa-spacerun:yes"> </span>
I realize experienced hands will not think this is an earth shattering discovery. My filter was searching for the exact string above, but. I couldn't get the filter to find the text in the file. Then, I noticed the space between > < didn't have a space dot. When I cut and pasted the "space" from my text file into the blank space in the filter, the filter worked. I then discovered the hex dump feature. The space turned out to be hex 160, a blank character that is similar in width to a normal space.
So the search string that works is: <span style="xfa-spacerun:yes">\160</span>
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Removing xml tag and text not working
A better solution would have been to use:
which will match any string. If you want to find out exactly what it matched, turn on prompt on replace, and then select the second tab of the Prompt on Replace dialog, which shows you a hex dump of matched expressions and much more.
Code: Select all
<span style="xfa-spacerun:yes">([^<]*)</span>
-
- Posts: 5
- Joined: Mon Nov 21, 2011 3:39 am
Re: Removing xml tag and text not working
Simon,
I copied and pasted your suggested code ([^<]*) into my filter and it does nothing on my system. You can check this. The line below has a hex 160 character between the > <. (Hopefully, it will survive posting...probably will because this is not an exotic character.)
<span style="xfa-spacerun:yes"> </span>
I wanted to test your better code to see if it would remove all instances of "<span style="xfa-spacerun:yes"> </span>"...no matter how many spaces occur between > <. I only want to remove lines that have one space, and retain the ones with two or more spaces.
My "\160" does this...apparently with less code. Plus, I figured out this grunt solution in about 15 minutes once I realized the problem. Methinks it would have taken me much longer to fully comprehend TP pattern theory and come up with "([^<]*)". Plus
I can, in about ten seconds, explain \160 to my wife who occasionally, when composing, uses Alt 0160 to add in these fixed spaces.
Some background. When pre-existing Acrobat documents are imported into Adobe LiveCycle Designer, all the text lines in a paragraph (for example) come in as separate lines, not as a united paragraph. To reduce the amount of xml code required, and to expedite future editing, we can merge these separate lines into a real paragraph. When LC does this, it always adds this span at the end of every text item. This causes the resulting text to be double-spaced in places where the text merged. We can have hundreds of these "spans" in one xml file. Visually reviewing text for double-spaces to correct is very tedious and prone to error. Removing spans that have only one space remedies this problem. Sometimes a composer may actually letter space out some text to position it better. Usually (not always) they will use more than one space. This creates a "span" in the xml. We want to leave those instances alone.
It's been about a week since I acquired TP. I will perservere, and learn TP pattern theory for use when I come up with issues that do require more coding elegance.
I copied and pasted your suggested code ([^<]*) into my filter and it does nothing on my system. You can check this. The line below has a hex 160 character between the > <. (Hopefully, it will survive posting...probably will because this is not an exotic character.)
<span style="xfa-spacerun:yes"> </span>
I wanted to test your better code to see if it would remove all instances of "<span style="xfa-spacerun:yes"> </span>"...no matter how many spaces occur between > <. I only want to remove lines that have one space, and retain the ones with two or more spaces.
My "\160" does this...apparently with less code. Plus, I figured out this grunt solution in about 15 minutes once I realized the problem. Methinks it would have taken me much longer to fully comprehend TP pattern theory and come up with "([^<]*)". Plus
I can, in about ten seconds, explain \160 to my wife who occasionally, when composing, uses Alt 0160 to add in these fixed spaces.
Some background. When pre-existing Acrobat documents are imported into Adobe LiveCycle Designer, all the text lines in a paragraph (for example) come in as separate lines, not as a united paragraph. To reduce the amount of xml code required, and to expedite future editing, we can merge these separate lines into a real paragraph. When LC does this, it always adds this span at the end of every text item. This causes the resulting text to be double-spaced in places where the text merged. We can have hundreds of these "spans" in one xml file. Visually reviewing text for double-spaces to correct is very tedious and prone to error. Removing spans that have only one space remedies this problem. Sometimes a composer may actually letter space out some text to position it better. Usually (not always) they will use more than one space. This creates a "span" in the xml. We want to leave those instances alone.
It's been about a week since I acquired TP. I will perservere, and learn TP pattern theory for use when I come up with issues that do require more coding elegance.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Removing xml tag and text not working
I should have mentioned to set the search type to 'Perl pattern'. For one and only character, this would be:
and for just \160 (which is hex A0)
As an EasyPattern search/replace, this becomes:
The capture is only necessary if you want to easily see what exotic character matched.
Code: Select all
<span style="xfa-spacerun:yes">([^<])</span>
Code: Select all
<span style="xfa-spacerun:yes">\xA0</span>
As an EasyPattern search/replace, this becomes:
Code: Select all
<span style="xfa-spacerun:yes">[ capture( 1 not '<') ]</span>
-
- Posts: 5
- Joined: Mon Nov 21, 2011 3:39 am
Re: Removing xml tag and text not working
Ah, DECIMAL 160, not Hex 160. Anyway... "\160", using Find Type "exact" does the job I want to do.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Removing xml tag and text not working
Exact uses a different matching algorithm, where \160 matches decimal. This is from TextPipe ancient history....