Page 1 of 1

Word Pair or Word Triplet Extracts

Posted: Tue Oct 14, 2003 12:38 am
by jgomberg
I am trying to write a Regex expression to match all adjacent word pairs and triplets for any given string of words. For example, the sentence:

"registrars have been contracted to perform services at very low prices" would produce the following word pairs:

"registrars have", "have been", "been contracted", "contracted to", "to perform", "perform services", etc.

or the following triplets:

"registrars have been", "have been contracted", "been contracted to", "contracted to perform", etc.

I can extract the first two words from a search string, such as:
(.*Subject: )(\w* ){2} filter out the first back reference, but I am stuck writing an expression that will pull all of the concurrent word pairs from a string.

Any suggestions how this can be done with regex alone?

Thanks,

Jeff

Posted: Mon Oct 20, 2003 11:10 am
by DataMystic Support
Hi Jeff,

I'm pretty sure you can't do this with regex alone. You can however get a regex to match each word, and then use a VBScript subfilter to process those words - keeping an array of 2 or three words and outputting them.

It looks like you're trying to generate a word concordance, which is something we started buiding into TP along time ago but never finished to our satisfaction.

reply

Posted: Wed Oct 22, 2003 2:28 am
by jring
hmmmm, interesting Q/A. I don't know how to write vb script, and I was able to create 2 filters that appear to have given me a solid jump on the problem. Simon - you know my email, send me and the guy who asked the original question a note, and I'll reply with my filters. Perhaps we can nip this one...what do you think?

"registrars have been","have been contracted","been contracted to","contracted to perform","to perform services","perform services at","services at very","at very low","very low prices"


"It looks like","looks like youre","like youre trying","youre trying to","trying to generate","to generate a","generate a word","a word concordance","word concordance which","concordance which is","something we started","we started buiding","started buiding into","buiding into TP","into TP along","TP along time","along time ago","time ago but","ago but never","but never finished"

joseph ring