Page 1 of 1

Find a tag pair and enclose between more material...?

Posted: Tue Mar 17, 2009 5:54 am
by Philip Goddard
This is certainly not possible for me to do at the moment, because TextPipe doesn't have a filter selection / creation interface or relevant 'help' text that I can really understand (and yes, I do have a science degree, for whatever good that did me! :D ). Thus I simply can't tell whether I'm wanting the impossible from TextPipe, but I feel that the task that I want to give to it ought to be relatively straightforward, if only I knew how to set it up.

I want to find in a search of all web pages of mine the tag pair <div id="mainTextPanel".*>, ONLY where there is a <br> tag immediately before and after that pair, and then insert material between the tag pair and the <br> tags before and after. In practice I'd be putting a further container div (with some contents) around the found tag pair.

It appears that the available filters for finding tag pairs are concerned with restricting an operation to within or between the particular tags, rather than looking at what is immediately outside them and putting stuff there.

I don't see how I can do this task safely by a straightforward search/replace, because the contents of the sought for div can itself contain <div> tags, and so it would not be possible by that means to identify the correct </div> - i.e. by using <div id="mainTextPanel".*>.*</div> as a search string. Most of the contained </div> tags would be followed by <p> tags rather than <br>, but I can't be sure of that always being the case.

Another constraint, too, could be the amount of text in the mainTextPanel div (i.e. between the tag pair), for it could theoretically be 100K or more (though normally would be much less), and I'm not sure how that would go down with its inclusion in a (.*) to then appear as $n in replace text.

Any ideas as to whether this task is easily feasible with TextPipe, and any simple tip for setting it up? Many thanks!

Re: Find a tag pair and enclose between more material...?

Posted: Fri Mar 20, 2009 9:37 am
by DataMystic Support
The first issue here is that TextPipe doesn't count or match <div>s for you. Assuming that you could this would be easy (this is a new feature on its way...)

Apart from this factor the perl pattern would be:
<br><div id="mainTextPanel".*>.*</div><br>

If you knew that there was just one div in between you might get away with this:
<br><div id="mainTextPanel".*>(.*<div.*/div>.*|.*)</div><br>
but is is pretty horrible and would do a horrendous amount of backtracking looking for possible matches.

The HTML filter is merely a nice wrapper for the normal search/replace functions (it hides some pretty nasty pattern matching expressions). You can alwyas write your own search/replace and control the maximum amount of text that is matched - 100Kbytes is fine but it could be slower unless you write your pattern carefully.

Re: Find a tag pair and enclose between more material...?

Posted: Sat Mar 21, 2009 6:35 am
by Philip Goddard
Thank you, Simon. Yes, really I came to the conclusion that what I was after simply couldn't be done, because some of my web page mainTextPanel divs do contain various numbers of nested divs, including in a few occasions a further level of nesting (ouch!).

I had quite an extensive Google search to see if I could find another search/replace program that might be able to do something of the sort, but I drew a blank on that (not wanting to pay a lot of money for a prestige program that just might possibly do it) - though I did at last find a program (##) that has a considerable choice of search/replace modes, including regex, and also has a find-only mode that enables me to go through the list of files containing matches, editing each one in KompoZer to do the necessary in situations where a straight search and replace can't safely do the job. TextPipe doesn't give me that functionality, although it's undoubtedly very good at what it does, so actually from now on I shall most likely mostly use that program and only occasionally have cause to use TextPipe.

Annoyingly, though, ## seems not to support lookahead expressions, whereas I found that TextPipe does. A negative lookahead enables me to find divs that contain or don't contain nested divs - though unfortunately when I want to make that sort of distinction it's generally for some sort of editing operation that I have to do manually, and so TextPipe is no help there as it doesn't provide a list of the files containing matches, which one could then manually process in one's editor of choice.