Page 1 of 1
Get keywords from text file
Posted: Mon Mar 21, 2011 3:37 am
by Richy20
Hi, great program. So glad I got it. I am not one to rush to a forum and bother others with my questions but I can't figure it out. It seems easy enough in concept but my mind can't wrap around it.
I need to get an output of most common word phrases(keywords).
Example:
My dog polly got puppies. Dog polly is fine.
Output:
Dog polly - 2
Dog - 2
Polly - 2
So it would go over the text and find that the phrase "dog polly" is mentioned 2 times in the text. Also that the words "dog" and "Polly" are mentioned 2 times.
Re: Get keywords from text file
Posted: Mon Mar 21, 2011 9:09 pm
by Richy20
I found this tool: hxxp://25yearsofprogramming.com/perl/phrasecounter.htm
It does exactly what I would like textpipe to do.
The tool is perfect but I would have to go over all my files 1 by 1.
Re: Get keywords from text file
Posted: Wed Mar 23, 2011 8:04 am
by DataMystic Support
A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.
But looking for phrases - is there a limit to the number of words in each phrase?
We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
Re: Get keywords from text file
Posted: Wed Mar 23, 2011 11:23 pm
by Richy20
DataMystic Support wrote:A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.
But looking for phrases - is there a limit to the number of words in each phrase?
We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
Yeah, I actually managed to get to the "turn into word list" -> "count how many words present" but it solved only maybe half of the problem so I did not want to post it.
I would probably require max 2-4 words in a phrase. I am so glad if this is achievable. Currently I am left to copy pasting text files one at a time into a program and checking the phrase/word frequency.
Re: Get keywords from text file
Posted: Thu Mar 24, 2011 9:35 am
by DataMystic Support
Ok! I am fairly proud of this - it took some thinking.
It takes text like this:
Code: Select all
my keyword list is special, and this is a highly useful test of this filter
and converts it to:
Code: Select all
my
keyword
my keyword
list
keyword list
my keyword list
is
list is
keyword list is
my keyword list is
special
is special
list is special
keyword list is special
..
The first part is a regex to match words:
Then a vbscript subfilter to output phrases:
Code: Select all
'Output phrases
dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)
function processLine(line, EOL)
'new word arrives in line
'shif old words along
a = b
b = c
c = d
d = line
out = d & vbCrLf
if c <> "" then out = out & c & " " & d & vbCrLf
if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf
processLine = out
end function
sub startJob()
end sub
sub endJob()
end sub
function startFile()
end function
function endFile()
end function
See attached filter.
Re: Get keywords from text file
Posted: Tue Mar 29, 2011 1:13 am
by dfhtextpipe
You were rightly proud of this filter and the VBScript it exemplifies.
Cool yet simple.
David
Re: Get keywords from text file
Posted: Tue Mar 29, 2011 2:13 am
by dfhtextpipe
Simon,
How would you enhance this VBScript filter to ensure that key phrases do not cross line boundaries?
e.g. Input
Code: Select all
Now is the time for all good men to come to the aid of the party.
It was the best of times. It was the worst of times.
Output must not include these phrases:
Code: Select all
party It
the party It
of the party It
party It was
the party It was
party It was the
David
Re: Get keywords from text file
Posted: Tue Mar 29, 2011 1:53 pm
by DataMystic Support
Ok,
Change the pattern to this:
and the VBScript to this:
Code: Select all
'Output phrases
dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)
function processLine(line, EOL)
'new word arrives in line
'shift old words along
a = b
b = c
c = d
d = line
out = d & vbCrLf
if c <> "" then out = out & c & " " & d & vbCrLf
if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf
'start a new phrase if this ends in a period
if right(line,1) = "." then
a = ""
b = ""
c = ""
d = ""
end if
processLine = out
end function
sub startJob()
end sub
sub endJob()
end sub
function startFile()
end function
function endFile()
end function
Re: Get keywords from text file
Posted: Thu Apr 07, 2011 11:51 am
by ezinestein
Hello,
This thread was great, and just what I needed! The last download and code change with the 'period' doesn't work for me. It seems to work just like the previous code and acts as if the period isn't even there. Not sure why cause I know nothing about javascript, but thought I'd mention. Thanks again.
ed
Re: Get keywords from text file
Posted: Sat Apr 23, 2011 10:20 pm
by Richy20
Holy mother of god you people are amazing.
I totally forgot I made this thread and was already searching other forums for help and doing a lot of research myself. I probably searched about 300 pages, tried 10-20 programs etc. I do value every helpful comment (or in this case being spoonfed). I actually concluded my search and frustratingly accepted that with my skill level it can't be done.
I will get through this material and try it out. Thank you DataMystic, I love you!
edit: IT WORKS! IT REALLY WORKS!
I feed it a article file and came back keywords. I added "count duplicate lines" to get a count for how many keywords/phrases and "descending numeric sort" filter. Now I need to work out how to remove 1 word lines (at first I thought I needed them but I probably won't) and make sure it's capable of processing a lot of files(how he outputs the info for me to be more reachable). This should be easier though as the hardest part for me at least was getting this program(or anything to that matter) give me a list of words and phrases. THANK YOU ALL!!!