Adding @inputFilename

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Adding @inputFilename

Post by dfhtextpipe »

If the input filename contains an accented character,
how can one ensure that adding @inputFilename does not corrupt the UTF-8 encoding for the rest of the file?

Example: filename is "44 Oberoù an Ebestel.txt"

Code: Select all

Comment...
|  Add id from filename
|
|--Add file header [\\id @inputFilename ]
|   
|--Comment...
|     Process inserted filename
|   
+--Restrict lines:Line 1 .. line 1
   |
   |--Remove multiple whitespace
   |   
   |--Replace [.txt] with []
   |     [X] Match case
   |     [ ] Whole words only
   |     [ ] Case sensitive replace
   |     [ ] Prompt on replace
   |     [ ] Skip prompt if identical
   |     [ ] First only
   |     [ ] Extract matches
   |   
   +--Perl pattern [^(\\id) (\d+) (\.+)$] with [$1 $2 $3\r\n\\h $3]
         [ ] Match case
         [ ] Whole words only
         [ ] Case sensitive replace
         [ ] Prompt on replace
         [ ] Skip prompt if identical
         [ ] First only
         [ ] Extract matches
         Maximum text buffer size 4096
         [X] Maximum match (greedy)
         [ ] Allow comments
         [ ] '.' matches newline
         [X] UTF-8 Support
       
This subfilter corrupts the rest of the file. Is this a bug?
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Adding @inputFilename in UTF-8

Post by DataMystic Support »

It's not a bug, because TextPipe doesn't care (or even know) about your file encoding in order to preserve it - you can do whatever you like to the file.

In this case, I suggest you add markers around the input filename in your Add file header such as

Code: Select all

#### @inputFilename ####
and then use a perl search/replace for

Code: Select all

#### (.*) ####
Replace with

Code: Select all

$1
and then use a Convert ASCII to UTF-8 subfilter to convert just that piece of text.
Post Reply