Replace given keywords from csv file into clickable URLs

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Replace given keywords from csv file into clickable URLs

Post by gerd »

I do not know whether the following is possible with textpipe and I think I will have it to be programmed in php or perl. But here is the replacement task because I think a lot of textpipe users might be interested in getting a textpipe solution :

I have a flat text file database (CSV-file) consistung of 2 columns and a pipe as a delimiter:

1st column=keyword and 2nd column= URL
Example lines
Textpipe |http://www.datamystic.com
Google |http://www.google.com
etc.

All text files to process should be searched for all the keywords (strings) in the first column of such csv file. Please note the beginning and ending blank in the first column (so entire word, not part of a word).

My request is to a certain degree comparable with the internal links in wikipedia.

1. All files within a given directory should be searched for the keywords from the csv-file. Example: 10 lines in the csv file. This means that there may be up until 10 replacements if matching is positive.
2. But only the FIRST occurence of the keyword should be replaced (see wikipedia). Example: If the string Textpipe occurs 5 times in the text file to be searched, then ONLY the first "finding" should be replaced.
3. Since the keyword (string) from the first column should not be replaced if its part of a html tag (e.g. a href or header tag) a specific regex would be helpful. Maybe subfilters can be helpful Example: The keyword (string) could read:
Textpipe
Textpipe.
Textpipe?
Ideally only the first ocurrence of such 3 strings should be replaced into a clickable url.
I know this is more a task for a php or perl programm but it felt it may not hurt to bring this point up.
gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi Gerd,

This is pretty simple with TextPipe. First, convert your pipe-delimited file to a CSV file, and ensure the spaces after the words are removed - they are not needed.

Then, refer to this file with the Filters\Replace\Search/replace list filter.
Check 'Replace first only' and 'find whole words only'. Set the Find type to 'Pattern (perl)'.

If you want to allow optional punctuation after the word, then you can pre-process the CSV list first.

Code: Select all

Restrict fields:1 Comma-delimited fields starting at field 1
|  [X] Process fields individually
|    [X] Exclude delimiter
|      [X] Exclude quotes (if present)
|  Delimiter Type: 0
|  Custom delimiter: 
|  [ ] Has Header
|
+--Add right margin [[\.\?]?>]
Adding the text

Code: Select all

[\.\?]??
allows for an optional full stop or period after the word.
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Post by gerd »

Simon,
I created a simple TAB delimited csv file and it works well. Example line out of it:
TextPipe <a href="http://www.datamystic.com">TextPipe</a>

There is just one problem which I do not know whether it can be solved with Textpipe and if so, how? Objective: Do not do the replacement if the keyword is already part of an <a ... Tag> which means the replacement has already been made (by TextPipe, manually or by other means).

3 text example sentences:
1. The software TextPipe is a very useful and powerful tool but it requires some time to take advantage of its power.
2. The software <a href="http://www.datamystic.com">TextPipe</a> is a very useful and powerful tool but it requires some time to take advantage of its power.
3. <a href="http://www.datamystic.com">TextPipe </a> is a very useful and powerful tool but it requires some time to take advantage of its power.

Objective:
Replacement to be done in example 1 (that works - no problem)
No replacements to be made in examples 2 and 3. Can this be achieved somehow?
Mostly the > and < come directly before or after the keyword to be searched for (see example 2). Sometimes there can be an additional Blank character before or after the keyword in a <a href tag> (see example 3).

I do not understand a single line which your code example
Restrict fields:1 Comma-delimited fields starting at field 1
| [X] Process fields individually
...
and the string "[\.\?]??" stands for.
As the following punctuation is not that important I do not want to make it more complicated and therefore, I skip it.
The more I work with TextPipe by trying to replicate examples from the forum the more I recognize the power of it.
gerd
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Post by gerd »

A follow-up to my posting

I have "played" now some time with <a[^>]*> from page 27 "Welcome to Text Pipe" so far without any success but by doing this I encountered one more problem.
The keyword to be searched and replaced can be a string WITHIN the HTML tag <a ... title="keyword"...>...</a> as well as BETWEEN the
<a...>keyword</a> tag as link text.

This means: If possible, ideally no replacements should be made if the keyword is between or within a <a...>...</a>. Or in other words, only if the keyword is not part - at any place - of an <a..>..</a> the replacement should be made.
Thanks again and I will not bother you with any other request.
gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

You'll love this :-)

Using a structure like this, and setting the Action field to 'Send non matching text to subfilter' - you can get TextPipe to only process text NOT inside <a>...</a> tags.

Code: Select all

Perl pattern [<a[^>]*>[^<>]*</a>] with []
|
+--Perl pattern [TextPipe] with [<a href="http://www.datamystic.com">TextPipe</a>]
      [X] Prompt on replace
With another layer of restriction, you should also be able to prevent it picking up keywords inside tags.
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Post by gerd »

Simon,
you are right . I love it.
It's just great.
I also played with different patterns and have found out that <a[^>]*>.*</a> replaces ALL occurences and your hint <a[^>]*>[^<>]*</a> replaces only the first one which is want I needed.

One hint to upgrades. Each time when I upgrade TextPipe I do it in contrary to hte instalaltion recommendation. So, I do it without deinstalling Textpipe first. I have no problem with it. (Windows XP).
Thanks a lot for your great support. Within the next weeks I will dig my head a bit deeper in TextPipe.
gere
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Thanks Gere, if you upgrade without removal then the Windows Explorer Context menu (right-click menu) DLL cannot be updated.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Explorer shell

Post by dfhtextpipe »

If you uninstall TextPipe prior to installing the updated version, you lose all your own carefully managed Explorer shell tools!

Install instructions should at least include a hint on exporting the registry key before uninstallation, and then merging it again aftwards.

Why can't this be automated better?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Good point - we'll look into it. We can actually do this pretty easily
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Update procedure should not require an uninstall

Post by dfhtextpipe »

Nearly all of the other software I receive updates for (that runs under Windows) can perform the update without requiring the user to perform an uninstall beforehand.

I see no reason why all of DataMystic software should not catch up with this industry trend.

Even when a running DLL has to be replaced, there are ways and means to accomplish the update seamlessly from the user's point of view. Go for it!
Post Reply