Get keywords from text file
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
Get keywords from text file
Hi, great program. So glad I got it. I am not one to rush to a forum and bother others with my questions but I can't figure it out. It seems easy enough in concept but my mind can't wrap around it.
I need to get an output of most common word phrases(keywords).
Example:
My dog polly got puppies. Dog polly is fine.
Output:
Dog polly - 2
Dog - 2
Polly - 2
So it would go over the text and find that the phrase "dog polly" is mentioned 2 times in the text. Also that the words "dog" and "Polly" are mentioned 2 times.
I need to get an output of most common word phrases(keywords).
Example:
My dog polly got puppies. Dog polly is fine.
Output:
Dog polly - 2
Dog - 2
Polly - 2
So it would go over the text and find that the phrase "dog polly" is mentioned 2 times in the text. Also that the words "dog" and "Polly" are mentioned 2 times.
Re: Get keywords from text file
I found this tool: hxxp://25yearsofprogramming.com/perl/phrasecounter.htm
It does exactly what I would like textpipe to do.
The tool is perfect but I would have to go over all my files 1 by 1.
It does exactly what I would like textpipe to do.
The tool is perfect but I would have to go over all my files 1 by 1.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Get keywords from text file
A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.
But looking for phrases - is there a limit to the number of words in each phrase?
We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
But looking for phrases - is there a limit to the number of words in each phrase?
We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
Re: Get keywords from text file
Yeah, I actually managed to get to the "turn into word list" -> "count how many words present" but it solved only maybe half of the problem so I did not want to post it.DataMystic Support wrote:A straight word count is easy using Filters\Convert\Text to word list followed by Filters\Special\Count duplicate lines.
But looking for phrases - is there a limit to the number of words in each phrase?
We did design a google adwords filter which would generate groups of 2, 3 and 4 words from the source text (in different orders), but the number of combinations you need depends on the maximum number of words you allow in a phrase.
I would probably require max 2-4 words in a phrase. I am so glad if this is achievable. Currently I am left to copy pasting text files one at a time into a program and checking the phrase/word frequency.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Get keywords from text file
Ok! I am fairly proud of this - it took some thinking.
It takes text like this:
and converts it to:
The first part is a regex to match words:
Then a vbscript subfilter to output phrases:
See attached filter.
It takes text like this:
Code: Select all
my keyword list is special, and this is a highly useful test of this filter
Code: Select all
my
keyword
my keyword
list
keyword list
my keyword list
is
list is
keyword list is
my keyword list is
special
is special
list is special
keyword list is special
..
The first part is a regex to match words:
Code: Select all
([\w-]*?)
Code: Select all
'Output phrases
dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)
function processLine(line, EOL)
'new word arrives in line
'shif old words along
a = b
b = c
c = d
d = line
out = d & vbCrLf
if c <> "" then out = out & c & " " & d & vbCrLf
if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf
processLine = out
end function
sub startJob()
end sub
sub endJob()
end sub
function startFile()
end function
function endFile()
end function
- Attachments
-
- generate phrase list.zip
- (841 Bytes) Downloaded 587 times
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Get keywords from text file
You were rightly proud of this filter and the VBScript it exemplifies.
Cool yet simple.
David
Cool yet simple.
David
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Get keywords from text file
Simon,
How would you enhance this VBScript filter to ensure that key phrases do not cross line boundaries?
e.g. Input
Output must not include these phrases:David
How would you enhance this VBScript filter to ensure that key phrases do not cross line boundaries?
e.g. Input
Code: Select all
Now is the time for all good men to come to the aid of the party.
It was the best of times. It was the worst of times.
Code: Select all
party It
the party It
of the party It
party It was
the party It was
party It was the
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Get keywords from text file
Ok,
Change the pattern to this:
and the VBScript to this:
Change the pattern to this:
Code: Select all
([\w-]*?)\.??
Code: Select all
'Output phrases
dim a, b, c, d
dim vbCrLf
vbCrLf = chr(13) & chr(10)
function processLine(line, EOL)
'new word arrives in line
'shift old words along
a = b
b = c
c = d
d = line
out = d & vbCrLf
if c <> "" then out = out & c & " " & d & vbCrLf
if b <> "" then out = out & b & " " & c & " " & d & vbCrLf
if a <> "" then out = out & a & " " & b & " " & c & " " & d & vbCrLf
'start a new phrase if this ends in a period
if right(line,1) = "." then
a = ""
b = ""
c = ""
d = ""
end if
processLine = out
end function
sub startJob()
end sub
sub endJob()
end sub
function startFile()
end function
function endFile()
end function
- Attachments
-
- generate phrase list.zip
- Mark 2
- (938 Bytes) Downloaded 558 times
-
- Posts: 13
- Joined: Sat Nov 10, 2007 11:10 am
Re: Get keywords from text file
Hello,
This thread was great, and just what I needed! The last download and code change with the 'period' doesn't work for me. It seems to work just like the previous code and acts as if the period isn't even there. Not sure why cause I know nothing about javascript, but thought I'd mention. Thanks again.
ed
This thread was great, and just what I needed! The last download and code change with the 'period' doesn't work for me. It seems to work just like the previous code and acts as if the period isn't even there. Not sure why cause I know nothing about javascript, but thought I'd mention. Thanks again.
ed
Re: Get keywords from text file
Holy mother of god you people are amazing.
I totally forgot I made this thread and was already searching other forums for help and doing a lot of research myself. I probably searched about 300 pages, tried 10-20 programs etc. I do value every helpful comment (or in this case being spoonfed). I actually concluded my search and frustratingly accepted that with my skill level it can't be done.
I will get through this material and try it out. Thank you DataMystic, I love you!
edit: IT WORKS! IT REALLY WORKS! I feed it a article file and came back keywords. I added "count duplicate lines" to get a count for how many keywords/phrases and "descending numeric sort" filter. Now I need to work out how to remove 1 word lines (at first I thought I needed them but I probably won't) and make sure it's capable of processing a lot of files(how he outputs the info for me to be more reachable). This should be easier though as the hardest part for me at least was getting this program(or anything to that matter) give me a list of words and phrases. THANK YOU ALL!!!
I totally forgot I made this thread and was already searching other forums for help and doing a lot of research myself. I probably searched about 300 pages, tried 10-20 programs etc. I do value every helpful comment (or in this case being spoonfed). I actually concluded my search and frustratingly accepted that with my skill level it can't be done.
I will get through this material and try it out. Thank you DataMystic, I love you!
edit: IT WORKS! IT REALLY WORKS! I feed it a article file and came back keywords. I added "count duplicate lines" to get a count for how many keywords/phrases and "descending numeric sort" filter. Now I need to work out how to remove 1 word lines (at first I thought I needed them but I probably won't) and make sure it's capable of processing a lot of files(how he outputs the info for me to be more reachable). This should be easier though as the hardest part for me at least was getting this program(or anything to that matter) give me a list of words and phrases. THANK YOU ALL!!!