Texpipe and Unicode (16LE) files
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
Texpipe and Unicode (16LE) files
I've heard a lot about Textpipe and decided to try it. Download 7.63 t&b and try to do simple things with Unicode files and can't. It seems it doesn't understand it completely. I tried to remove trailing spaces - nothing. Trying to do that with \t+\n (these are mostly tabs) and nothing again. All other perl pattern doesn't work here with but worked without any problen in Uedit and Emeditor. Why so? Do not propose to convert files to ANSI cause files contains symbols from 3 symbol sets - non standart western, cyrillic, greek.
The help when it talks about work with Unicode files is worse than very bad.
May be necessary to do Unicodepipe?
The help when it talks about work with Unicode files is worse than very bad.
May be necessary to do Unicodepipe?
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Hi there,
TextPipe has specific filters to deal with Unicode (UTF16LE) data, such as the Unicode search/replace and Unicode pattern filters. For backward compatability, the original ANSI/ASCII based filters have not been modified.
So, if you'd like to use the Remove Trailing Spaces filter (which is ASCII), first convert the file to UTF-8, apply the filters, then convert it back.
The initial conversion to UTF-8 is the key here. TextPipe is used for a lot of mainframe data files, so converting EBCDIC to Unicode for internal processing is not an option until the Mainframe record structure has been unravelled.
TextPipe has specific filters to deal with Unicode (UTF16LE) data, such as the Unicode search/replace and Unicode pattern filters. For backward compatability, the original ANSI/ASCII based filters have not been modified.
So, if you'd like to use the Remove Trailing Spaces filter (which is ASCII), first convert the file to UTF-8, apply the filters, then convert it back.
The initial conversion to UTF-8 is the key here. TextPipe is used for a lot of mainframe data files, so converting EBCDIC to Unicode for internal processing is not an option until the Mainframe record structure has been unravelled.
see no progress for multilanguage files in 8
I have downloaded trial version of 8 Textpipe
Task
Need to create sorted wordlist from UTF8 (I've taken into account Your previous recommendations) txt file containg German and russian words (Umlauts and cyrillic).
Use extract matches \w+
Sort ANSI
and what
In trial output everything seems OK but resulting file have unknown encoding.
Opening it as ANSI makes russian text completely unreadable. Open it as UTF8 shows that all cyrillic words are damaged and can't be used.
What's a hell??? Who is wrong here - I or a program.
Task
Need to create sorted wordlist from UTF8 (I've taken into account Your previous recommendations) txt file containg German and russian words (Umlauts and cyrillic).
Use extract matches \w+
Sort ANSI
and what
In trial output everything seems OK but resulting file have unknown encoding.
Opening it as ANSI makes russian text completely unreadable. Open it as UTF8 shows that all cyrillic words are damaged and can't be used.
What's a hell??? Who is wrong here - I or a program.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
You may need to add a new UTF-8 BOM to the resulting file - use
Filters\Add\File Header
with text of
Filters\Add\File Header
with text of
Code: Select all
\xEF\xBB\xBF
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Standard or Pro ?
Is niccolo using TextPipe Standard or TextPipe Professional?
For the task in hand does it matter which ?
For the task in hand does it matter which ?
DFH - Textpipe pro trial 8
Here the sample, filters used (1st with sorting 2nd simple wordlist creating) and results. In both results files cyrillic word are corrupted but everything is ok in trial run windows. It's not a BOM problem
http://rapidshare.com/files/76100564/pack.zip.html
I've solved this problem with other software but what's a hell when decide to try Textpipe there are always problem with this. When the native unicode support will be implemented with regexes etc?
Here the sample, filters used (1st with sorting 2nd simple wordlist creating) and results. In both results files cyrillic word are corrupted but everything is ok in trial run windows. It's not a BOM problem
http://rapidshare.com/files/76100564/pack.zip.html
I've solved this problem with other software but what's a hell when decide to try Textpipe there are always problem with this. When the native unicode support will be implemented with regexes etc?
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
I have been using TextPipe to process lots of UTF-8 files
I have been using TextPipe Standard to process lots of UTF-8 files, all with success, including many with non-Latin characters, such as Cyrillic, Chinese, Thai, Amharic, Japanese, Hebrew.
Only the trial area has those restrictions, just as Simon already explained.
Only the trial area has those restrictions, just as Simon already explained.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Don't have a rapidshare account
The link you posted took me to a page wanting me to pay for an account. Please make it easier for other members to help you.
DFH - if You don't use proxy there should be no problem with getting file.
Copy link into browser and press enter. In the opened screen press FREE.
Then appears another window where You are asked to enter code on a small picture (No premium Please enter). Type it in box below and press Download via ....... button.
Copy link into browser and press enter. In the opened screen press FREE.
Then appears another window where You are asked to enter code on a small picture (No premium Please enter). Type it in box below and press Download via ....... button.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Downloaded it now, thanks !
I didn't see the buttons before - thanks for help.