Word Pair or Word Triplet Extracts

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
jgomberg

Word Pair or Word Triplet Extracts

Post by jgomberg »

I am trying to write a Regex expression to match all adjacent word pairs and triplets for any given string of words. For example, the sentence:

"registrars have been contracted to perform services at very low prices" would produce the following word pairs:

"registrars have", "have been", "been contracted", "contracted to", "to perform", "perform services", etc.

or the following triplets:

"registrars have been", "have been contracted", "been contracted to", "contracted to perform", etc.

I can extract the first two words from a search string, such as:
(.*Subject: )(\w* ){2} filter out the first back reference, but I am stuck writing an expression that will pull all of the concurrent word pairs from a string.

Any suggestions how this can be done with regex alone?

Thanks,

Jeff
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi Jeff,

I'm pretty sure you can't do this with regex alone. You can however get a regex to match each word, and then use a VBScript subfilter to process those words - keeping an array of 2 or three words and outputting them.

It looks like you're trying to generate a word concordance, which is something we started buiding into TP along time ago but never finished to our satisfaction.
jring
Posts: 9
Joined: Tue Sep 23, 2003 3:13 am

reply

Post by jring »

hmmmm, interesting Q/A. I don't know how to write vb script, and I was able to create 2 filters that appear to have given me a solid jump on the problem. Simon - you know my email, send me and the guy who asked the original question a note, and I'll reply with my filters. Perhaps we can nip this one...what do you think?

"registrars have been","have been contracted","been contracted to","contracted to perform","to perform services","perform services at","services at very","at very low","very low prices"


"It looks like","looks like youre","like youre trying","youre trying to","trying to generate","to generate a","generate a word","a word concordance","word concordance which","concordance which is","something we started","we started buiding","started buiding into","buiding into TP","into TP along","TP along time","along time ago","time ago but","ago but never","but never finished"

joseph ring
Post Reply