Get keywords from text file

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Richy20
Posts: 4
Joined: Mon Mar 21, 2011 3:30 am

Get keywords from text file

Post by Richy20 »

Hi, great program. So glad I got it. I am not one to rush to a forum and bother others with my questions but I can't figure it out. It seems easy enough in concept but my mind can't wrap around it.

I need to get an output of most common word phrases(keywords).

Example:
My dog polly got puppies. Dog polly is fine.

Output:
Dog polly - 2
Dog - 2
Polly - 2

So it would go over the text and find that the phrase "dog polly" is mentioned 2 times in the text. Also that the words "dog" and "Polly" are mentioned 2 times.
Richy20
Posts: 4
Joined: Mon Mar 21, 2011 3:30 am

Re: Get keywords from text file

Post by Richy20 »

I found this tool: hxxp://25yearsofprogramming.com/perl/phrasecounter.htm

It does exactly what I would like textpipe to do.
The tool is perfect but I would have to go over all my files 1 by 1.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Get keywords from text file

Post by DataMystic Support »

A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.

But looking for phrases - is there a limit to the number of words in each phrase?

We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
Richy20
Posts: 4
Joined: Mon Mar 21, 2011 3:30 am

Re: Get keywords from text file

Post by Richy20 »

DataMystic Support wrote:A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.

But looking for phrases - is there a limit to the number of words in each phrase?

We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
Yeah, I actually managed to get to the "turn into word list" -> "count how many words present" but it solved only maybe half of the problem so I did not want to post it.

I would probably require max 2-4 words in a phrase. I am so glad if this is achievable. Currently I am left to copy pasting text files one at a time into a program and checking the phrase/word frequency.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Get keywords from text file

Post by DataMystic Support »

Ok! I am fairly proud of this - it took some thinking.

It takes text like this:

Code: Select all

my keyword list is special, and this is a highly useful test of this filter
and converts it to:

Code: Select all

my
keyword
my keyword
list
keyword list
my keyword list
is
list is
keyword list is
my keyword list is
special
is special
list is special
keyword list is special
..

The first part is a regex to match words:

Code: Select all

([\w-]*?)
Then a vbscript subfilter to output phrases:

Code: Select all

'Output phrases 

dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)

function processLine(line, EOL)

  'new word arrives in line
  'shif old words along
  a = b  
  b = c
  c = d
  d = line
  
  out = d & vbCrLf
  if c <> "" then out = out & c & " " & d & vbCrLf
  if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
  if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf

  processLine = out

end function


sub startJob()
end sub


sub endJob()
end sub


function startFile()
end function


function endFile()
end function
See attached filter.
Attachments
generate phrase list.zip
(841 Bytes) Downloaded 588 times
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Get keywords from text file

Post by dfhtextpipe »

You were rightly proud of this filter and the VBScript it exemplifies.

Cool yet simple.

David
David
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Get keywords from text file

Post by dfhtextpipe »

Simon,

How would you enhance this VBScript filter to ensure that key phrases do not cross line boundaries?

e.g. Input

Code: Select all

Now is the time for all good men to come to the aid of the party.

It was the best of times. It was the worst of times.
Output must not include these phrases:

Code: Select all

party It
the party It
of the party It
party It was
the party It was
party It was the
David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Get keywords from text file

Post by DataMystic Support »

Ok,

Change the pattern to this:

Code: Select all

([\w-]*?)\.??
and the VBScript to this:

Code: Select all

'Output phrases 

dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)

function processLine(line, EOL)

  'new word arrives in line
  'shift old words along
  a = b  
  b = c
  c = d
  d = line
  
  out = d & vbCrLf
  if c <> "" then out = out & c & " " & d & vbCrLf
  if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
  if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf

  'start a new phrase if this ends in a period
  if right(line,1) = "." then
    a = ""  
    b = ""
    c = ""
    d = ""
  end if

  processLine = out

end function


sub startJob()
end sub


sub endJob()
end sub


function startFile()
end function


function endFile()
end function
Attachments
generate phrase list.zip
Mark 2
(938 Bytes) Downloaded 558 times
ezinestein
Posts: 13
Joined: Sat Nov 10, 2007 11:10 am

Re: Get keywords from text file

Post by ezinestein »

Hello,

This thread was great, and just what I needed! The last download and code change with the 'period' doesn't work for me. It seems to work just like the previous code and acts as if the period isn't even there. Not sure why cause I know nothing about javascript, but thought I'd mention. Thanks again.

ed
Richy20
Posts: 4
Joined: Mon Mar 21, 2011 3:30 am

Re: Get keywords from text file

Post by Richy20 »

Holy mother of god you people are amazing.

I totally forgot I made this thread and was already searching other forums for help and doing a lot of research myself. I probably searched about 300 pages, tried 10-20 programs etc. I do value every helpful comment (or in this case being spoonfed). I actually concluded my search and frustratingly accepted that with my skill level it can't be done.

I will get through this material and try it out. Thank you DataMystic, I love you! :mrgreen:

edit: IT WORKS! IT REALLY WORKS! :D I feed it a article file and came back keywords. I added "count duplicate lines" to get a count for how many keywords/phrases and "descending numeric sort" filter. Now I need to work out how to remove 1 word lines (at first I thought I needed them but I probably won't) and make sure it's capable of processing a lot of files(how he outputs the info for me to be more reachable). This should be easier though as the hardest part for me at least was getting this program(or anything to that matter) give me a list of words and phrases. THANK YOU ALL!!!
Post Reply