How to add headers by using conditions,search pattern, and loop in different folders

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

Hello All,

May I know how to solve the following problem in header adding according to conditions and loop?

I have two folders of txt files. Folder A contains the files that need to add a header. Each file in Folder A needs a different header from Folder B based on pattern search and matching condition. Folder B contains the header information in a file or files that should be searched and copy based on patterns.

Files in Folder A is like this:
Text_e_001.txt
Text_e_002.txt


Files in Folder B is like this:
<line number="1">
<Text_ID>Text_e_001</Text_ID>
<Author>Tony Cheung</Author>
<Date>3/28/2016</Date>
<Topic>‘That’s irresponsible’: Hong Kong’s top Anglican rejects calls to give up Christian seats on body electing city’s chief executive</Topic>
<Source>South China Morning Post</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>


<line number="2">
<Text_ID>Text_e_002</Text_ID>
<Author>Ho Lok-Sang</Author>
<Date>9/2/2014</Date>
<Topic>National security always remains a vital concern </Topic>
<Source>China Daily Hong Kong Edition | P10</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>




I would like to find in Folder B sections between <line number=".*?"> and </line>, and add this searched part as header into files in Folder A based on matching file names (e.g.Text_e_001.txt) in Folder A and Text_ID (e.g. <Text_ID>Text_e_001</Text_ID>) in the searched part in Folder B.

The difficulty is there will be different sections of <line number="[0-9]*?">.*?</line> in Folder B.

How to connect Folder A and Folder B? Search sections in folder B (<line number=".*?"> .*? </line>) that contains different <Text_ID>.*?</Text_ID>. Then based on each Text_ID information in each searched section, find the relevant txt filename in Folder A. If the filename in Folder A matches the Text_ID in each searched sections match, the relevant part (<line number=".*?"> .*? </line>) in Folder B will be added as a header to each files in Folder A.

The testing Folder A and B are attached here. I have been trying different filters in Textpipe for a long time. However, it seems there is no possible solutions or filters in Textpipe.

Thank you very much for your kind help and tips.

Sincerely
Typetoken
Attachments
Folder B.zip
(50.35 KiB) Downloaded 800 times
Folder A.zip
(7.3 KiB) Downloaded 751 times
Last edited by Typetoken on Mon May 08, 2017 11:45 pm, edited 4 times in total.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

The easiest way is to use 'Restrict to filenames matching' and inside this filter, put the relevant search/replace filters.
One Restrict to filenames matching filter for each folder.
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

DataMystic Support wrote:The easiest way is to use 'Restrict to filenames matching' and inside this filter, put the relevant search/replace filters.
One Restrict to filenames matching filter for each folder.
Thank you very much indeed for your tips, Simon.

I am still very puzzled. There is only one txt file in folder B. But this file will contain the combinaiton of all the headers needed for each file in folder A. Seems no need to use Restrict to filenames matching filter for folder B. Moroever, how can we just grep each pattern between <line number=".*?"> and </line> and paste it back to each relevant file in folder A as each file header. Each file name (e.g. Text_e_001) in Folder A will be the bridge to connect searched information in Folder B, since the following searched information in the file in Folder B will contain each file ID information (e.g.<Text_ID>Text_e_001</Text_ID> ):

<line number="1">
<Text_ID>Text_e_001</Text_ID>
<Author>Tony Cheung</Author>
<Date>3/28/2016</Date>
<Topic>‘That’s irresponsible’: Hong Kong’s top Anglican rejects calls to give up Christian seats on body electing city’s chief executive</Topic>
<Source>South China Morning Post</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>


The above part " <line number="1">
<Text_ID>Text_e_001</Text_ID>
<Author>Tony Cheung</Author>
<Date>3/28/2016</Date>
<Topic>‘That’s irresponsible’: Hong Kong’s top Anglican rejects calls to give up Christian seats on body electing city’s chief executive</Topic>
<Source>South China Morning Post</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>"
in the txt file in Folder B will be grepped and put at the beginning of the file Text_e_001.txt in Folder A as a header.

Then, smiliarly

the searched part (not file name) in the txt file in folder B

<line number="2">
<Text_ID>Text_e_002</Text_ID>
......
</line>"

will be grepped and pasted to the beginning of the following txt file in Folder A:
Text_e_002.txt

Grateful for your kind help and guidance!

Sincerely
Typetoken
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

Sorry, I mis-read your detailed requirements.

Correct me if I am wrong, but you are just inserting files based on a search/replace condition?

In that case, a search/replace for the condition, with an Add Header filter, using the external file option, should work fine.
Failing that, the VBScript filter (TextPIpe x32 only at this stage) has a sample filter called replace filename with file contents.fll which can be used.
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

Thank you so much for your guidance.

Following your tips, I have created the attached filter.

However, it seems to add all the searched results in Folder B into files in Folder A instead of adding each search result in Folder B to the relevant file in Folder B based on the consistent filename information in Folder A and searched Text_ID in Folder B.

VB seems a formidable challenge to me since I've never learned VB before.

Grateful for your kind help!
Attachments
add header based on condition to match two files.7z
(854 Bytes) Downloaded 730 times
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

Your problem can be greatly simplified by converting file B into a CSV file, where column 1 is the filename, and column 2 is the complete replacement header for that filename.

You can use TextPipe to reformat the file in this way.

Once you have this, you can use this filter (assuming you have double-quoted the " correctly) as a search/replace list.

The Add Header filters add the input filename to the file, and then sends this value to the search/replace list filter. Done!

Code: Select all

Add file header [@inputfilename@]
|
+--Replace list:  Perl pattern
      [X] Match case
      [ ] Whole words only
      [ ] Case sensitive replace
      [X] Prompt on replace
      [ ] Skip prompt if identical
      [ ] First only
      [ ] Extract matches
          Maximum text buffer size 4096
      [ ] Maximum match (greedy)
      [ ] Allow comments
      [X] '.' matches newline
      [X] UTF-8 Support

      [ ] Process longest strings first
      [ ] Simultaneous search
      [ ] Log summary only
    
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

Thank you so much indeed.

The situation is like this.

The file in Folder B is in xml format. The search result in Folder B including the tags will be added to each relevant file in Folder A based on filename information in Folder A. If the File in Folder B is converted to CSV, will the tags be kept?

Moreover, in Folder A, there will be more than one thousand files with different filenames. Will it need a loop or sth?

My flow chart is as follows:

1) Search the first file in Folder A (e.g. Text_e_001.txt). Then capture the filename Text_e_001
2) Search the any header component in Folder B by <line number=".*?">.*?</line>
3) Condition: compare and see If the captured filename value in step 1) (e.g Text_e_001) matches with the element <Text_ID>.*?</Text_ID> in Step 2) (e.g. <Text_ID>Text_e_001</Text_ID>)

4) If matched, only the following part in Folder B (searched by <line number=".*?">.*?</line>, and containing <Text_ID>Text_e_001</Text_ID> ) will be added to file named Text_e_001.txt as a header.

e.g.
<line number="1">
<Text_ID>Text_e_001</Text_ID>
<Author>Tony Cheung</Author>
<Date>3/28/2016</Date>
<Topic>‘That’s irresponsible’: Hong Kong’s top Anglican rejects calls to give up Christian seats on body electing city’s chief executive</Topic>
<Source>South China Morning Post</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>

5) The same process will be applied to each file in Folder A.

Thousand thanks indeed! Simon.

Sincerely
Typetoken
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

The CSV conversion will work fine IF you correctly escape the " in the HTML. You can also change them to ' to ensure it works, and put the entire group of tags into "..." - TextPipe will parse the new lines in the replace field correctly.

You just need a search/replace list. Far easier than the other approach.
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

Thank you so much again, Simon.

Do you mean the following precedure ?
1) I create a csv file containing two columns:
Column 1 is filename (e.g.Text_e_001.txt)
Column 2 is the header list with tags
(e.g. <line number="1">
<Text_ID>Text_e_001</Text_ID>
<Author>Tony Cheung</Author>
<Date>3/28/2016</Date>
<Topic>‘That’s irresponsible’: Hong Kong’s top Anglican rejects calls to give up Christian seats on body electing city’s chief executive</Topic>
<Source>South China Morning Post</Source>
<Register_Genre>Newspaper</Register_Genre>
</line>)

Question: Can several line breaks be kept in Column 2 as above example?

2) Then in “Files to be process", choose Folder A which contains all the files (more than 1,000 files) to be added headers to

3) Then insert a filter "add file header", choose "From file" to input the prepared CSV files, which is a list of headers

4) Create a search/replace list filter, choose "From file" to input the prepared CSV files.

Question: Shall I put the search/replace list filter nested as a subfilter to "add file header"?

5) Then in the search/replace list filter, search "Text_e_.*?.txt" in column one in the csv file and then replace the whole line with the header information in relevant second column.

Questions: Will it search each "Text_e_.*?.txt" in column 1 in the csv file or will it find it in the filenames in Folder A? I am lost. Each time they process 1 file in Folder A based on the condition that that file's name can be found column one in the CSV replace list.

Do I need to search each specific filename in column 1 (e.g. Text_e_001.txt) and process one replacement each time? Or can it search with pattern and process all the files in Foder A (more than 1000 files there) in a batch?


I am so unskilled with textpipe. Thanks again for your kind guidance and help.

Sincerely
Typetoken
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

Almost.

1. Create CSV as specified
2. Add a 'Add Header' filter, with text of @inputFilename@. This will result in Text_e_001.txt, Text_e_002.txt etc in the file.
3. Add a search/replace list filter. Specify your external search/replace list.

Wherever the name of a file is in any input file, it will replace with the header from the csv.

So if your Text_e_001.txt accidentally contains Text_e_002.txt somewhere, this will also get replaced (you can change this, but you're just inserting a header, correct)?
Typetoken
Posts: 7
Joined: Sat Apr 29, 2017 10:50 pm

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by Typetoken »

Thank you very much indeed.

Though I read the help file of "Add Header Filter" this afternoon, I can't figure it out how to use it with inputfilename. Seems there are no examples there to illustrate the usage.

Now, I totally understand and figure it out with your specific instruction "Add a 'Add Header' filter, with text of @inputFilename@", which can capture filenames and add it as the first line in each file if I put @inputFilename@ into the text box.

By the way, when I research into Add Header filter, it seems that there are only two options: either add header from an external file or add header in the following input text.

Will we have a pattern function in Add Header Filter so that we can search and grep a pattern and add it to header? If so, the add header filter will be very powerful.

Hope we can integrate such a flexible function with Add Header Filter.

I really appreciate your guidance and help in the past days.

Sincerely
Typetoken
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How to add headers by using conditions,search pattern, and loop in different folders

Post by DataMystic Support »

Simply paste the text
@inputFilename@
into the Add Header Filter's text box.

Please test what this does to each output file and you will see what is happening.

Ensure you set output files to a new extension or new folder for testing using the output filter,
Post Reply