Page 1 of 1

Adding @inputFilename

Posted: Tue May 18, 2010 5:06 am
by dfhtextpipe
If the input filename contains an accented character,
how can one ensure that adding @inputFilename does not corrupt the UTF-8 encoding for the rest of the file?

Example: filename is "44 OberoĆ¹ an Ebestel.txt"

Code: Select all

Comment...
|  Add id from filename
|
|--Add file header [\\id @inputFilename ]
|   
|--Comment...
|     Process inserted filename
|   
+--Restrict lines:Line 1 .. line 1
   |
   |--Remove multiple whitespace
   |   
   |--Replace [.txt] with []
   |     [X] Match case
   |     [ ] Whole words only
   |     [ ] Case sensitive replace
   |     [ ] Prompt on replace
   |     [ ] Skip prompt if identical
   |     [ ] First only
   |     [ ] Extract matches
   |   
   +--Perl pattern [^(\\id) (\d+) (\.+)$] with [$1 $2 $3\r\n\\h $3]
         [ ] Match case
         [ ] Whole words only
         [ ] Case sensitive replace
         [ ] Prompt on replace
         [ ] Skip prompt if identical
         [ ] First only
         [ ] Extract matches
         Maximum text buffer size 4096
         [X] Maximum match (greedy)
         [ ] Allow comments
         [ ] '.' matches newline
         [X] UTF-8 Support
       
This subfilter corrupts the rest of the file. Is this a bug?

Re: Adding @inputFilename in UTF-8

Posted: Wed May 19, 2010 11:32 am
by DataMystic Support
It's not a bug, because TextPipe doesn't care (or even know) about your file encoding in order to preserve it - you can do whatever you like to the file.

In this case, I suggest you add markers around the input filename in your Add file header such as

Code: Select all

#### @inputFilename ####
and then use a perl search/replace for

Code: Select all

#### (.*) ####
Replace with

Code: Select all

$1
and then use a Convert ASCII to UTF-8 subfilter to convert just that piece of text.