Non standard characters in Search/Replace List

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Non standard characters in Search/Replace List

Post by buddhist108 »

Hi

Im using Filter>Replace>Search/Replace List

I have a big list thousands of keyword (exel file) and also including word with non standard characters like Adhiṭṭhāna , Aññāta-Kondañña etc.

But the problems is it gives ? marks instead of these non standards letter

Is there a possibility so it would replace them as they are

Example Adhiṭṭhāna should be replaced with [[Adhiṭṭhāna]]

Im doing project Chinese Buddhist Encyclopedia http://www.chinabuddhismencyclopedia.com/en where is thousand of articles and i use it to create internal links. I copy the article to rtf document and then let TextPipe to go through it and make the replaces.

It would be very appreciated when somebody could help me out because these words are very needed as links.

Can somebody give me instructions how can i create additional filters or functions do to that.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Non standard characters in Search/Replace List

Post by DataMystic Support »

With your search/replace list, are you specifying the Unicode search/replace type? And are your files UTF-8 or UTF-16LE? (they need to be UTF-16LE).
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

Sorry but im quite new with all this.

No im only using one filter and thats Search/Replace List.

How can i specify the Unicode search/replace type for these characters.

And how can i make my .rtf text file into UTF-16LE.

Thank you!!!
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Non standard characters in Search/Replace List

Post by DataMystic Support »

For Unicode in the RTF file (from http://en.wikipedia.org/wiki/Rich_Text_Format):
For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.
Don't change the search type (which is a setting of the search/replace filter), but instead, specify your search/replace pairs using \uHHHH? syntax.

To convert your existing list to the \uHHHH? form, use the attached filter. It converts a CSV replacement list to the \uHHHH? format that will work.
Attachments
convert to rtf unicode.zip
(811 Bytes) Downloaded 721 times
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

Hi

Does it means i have to manually replace all the nonstandard letter to their /u values in the .rtf file.

Or is there some way so i can do it automatically because i have thousands of articles which i need to go through.

Also when i run the filter you sent on my list it gives me PKsŠòBKjõ§¤º[Content_Types].xmlíXË’œ0¼§*ÿb,[‚cÀæ¸ñÿß©eÃ>RIv¶fköBQB´ZݲÇ̺ such outcome.

I attached my list maybe you can have a look.
my list files.zip
(46.03 KiB) Downloaded 755 times
tumtum
Posts: 9
Joined: Wed Jun 29, 2011 5:34 pm

Re: Non standard characters in Search/Replace List

Post by tumtum »

please tell me about format output what you want .

in format ".rtf" or ".txt", because output 2 format is different .
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

It dosent matter much which output probably .rtf because simple .txt dosent hold the non standard characters.

The main issue is to get the textpipe to replace the word with nonstandard characters as well.

Thanks
tumtum
Posts: 9
Joined: Wed Jun 29, 2011 5:34 pm

Re: Non standard characters in Search/Replace List

Post by tumtum »

sorry for miss main issue

you can use filter Replace -> Search/replace list -> insert path replacelist,
you can use *.tab file with UTF-8 to replace

(1st line must blank becuase 1st line have Byte Order Marker (BOM) )
Attachments
examplescript.JPG
Last edited by tumtum on Sat Jul 20, 2013 3:36 pm, edited 1 time in total.
tumtum
Posts: 9
Joined: Wed Jun 29, 2011 5:34 pm

Re: Non standard characters in Search/Replace List

Post by tumtum »

Example replace list
Attachments
replacelist.zip
(3.11 KiB) Downloaded 770 times
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

I tried to run the .tab list you sent but there is still no affects on the non standard characters in output file.

Output file (.rtf) text : words with [[ ]] are what TextPipe replaced
------------
Abhayamātā is the most recent system of development of tantra in gelug tradition and Abhayanagara is nyingma [[yogacara]] Abhayanāga.

Brahmavihāras are the most [[Buddha]] Śākyamuni mañjuśrīkumārabhūta and mañjuśrīmitra in gohō dōji.
------------------------

Words what are marked bold are included in the list but it wont replace them with [[ ]] on each side as needed.

Is there something else i should be aware of?
tumtum
Posts: 9
Joined: Wed Jun 29, 2011 5:34 pm

Re: Non standard characters in Search/Replace List

Post by tumtum »

Is there something else i should be aware of?
You should to known input "text" from ".rtf" is not contains specials character, if you want to know what is "special character" in file ".rtf" you should open file by "notepad" or text editor you can see
{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\fswiss\fprq2\fcharset186 Calibri Baltic;}{\f1\fswiss\fprq2\fcharset0 Calibri;}{\f2\fswiss\fprq2\fcharset186 Calibri;}{\f3\fswiss\fprq2\fcharset238 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\nowidctlpar\sa200\sl276\slmult1\lang9\f0\fs22 Abhayam\'e2t\'e2\lang1033\f1 is the most recent system of development of tantra in gelug tradition and \lang9 Abhayanagara\lang1033 is nyingma yogacara \lang9\f0 Abhayan\'e2ga\f1 .\par
\par
Brahmavih\f2\'e2ras\lang1033\f1 are the most Buddha \f3\'8c\u257?kyamuni\f1 ma\'f1ju\f3\'9cr\u299?kum\u257?rabh\u363?ta\f1 and ma\'f1ju\f3\'9cr\u299?mitra\f1 in goh\f2\'f4 d\'f4ji\f1 \u238? \f3\'8c \u257? .\lang9\par
\par
}
This input ".rtf" is not have any sprcial character, rigth ? .

that cause i ask you in first comment .
what is output format what you want ?
because if you want replace only special character in your topic
Non standard characters in Search/Replace List
, you can use my replace list to replace by textpipe can do it very easy .

but your input (".rtf") doesn't contain any special character,

that cause my replace list (".tab") file i send is not find any special character




First thing you should to aware is what character input you want to find and replace .


Second thing you should to aware is "Sequence of replcase list" ,
because file replace list is read and replace line by line

example replacelist

book red book
this is a book this is my book.


first find and replace is "book -> red book" .

Although your input text is "this is a book" , cannot replace to "this is my book."


Because first find and replace is "book", you will got output

this is a red book .



Regard
Panupong Sanprasit
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

Ok i put my question in this way.

I have a text
Abhayamātā is the most recent system of development of tantra in gelug tradition and Abhayanagara is nyingma [[yogacara]] Abhayanāga.

Brahmavihāras are the most Buddha Śākyamuni mañjuśrīkumārabhūta and mañjuśrīmitra in gohō dōji.
and the out come have to be
[[Abhayamātā]] is the most recent system of development of [[tantra]] in [[gelug]] tradition and [[Abhayanagara]] is [[nyingma]] [[yogacara]] [[Abhayanāga]].

[[Brahmavihāras]] are the most [[Buddha Śākyamuni]] [[mañjuśrīkumārabhūta]] and [[mañjuśrīmitra]] in [[gohō dōji]].
Is this possible or not?

At the moment i can replace multiple word like Buddha Sakyamuni to [[Buddha Sakyamuni]] and even find strings containing more than three words. Using search/replace list with function Simultaneous search

But the problem i have it wont replace find strings(consists of multiple words) which contains letter like ñ, ā, ō, ī, ś etc.

I have a list which contains aprox 12000 find strings.

And i have to go through thousands of articles which lenght from couple of hundred up to 4000 words or more.

I copy the article from web page and insert it to .rtf document for TextPipe to process.

Can somebody tell me how to compose a filter to able to perform this process or is it impossible?
Attachments
filter.png
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Non standard characters in Search/Replace List

Post by DataMystic Support »

It will work but you need to convert your replace list to the right format first.

Since RTF does not store Unicode, but instead stores unicode chars as \Uhhhh?, this adds extra complication to the steps.

1. Instead of Excel .xls, use a .TAB format file saved in UTF16 (which you can still edit in Excel).
2. So the input files goes from:

Code: Select all

avīci	[[avīci]]
to:

Code: Select all

av\U2B01?ci	[[av\U2B01?ci]]
3. Now you can search an RTF file using this search/replace list, in Exact mode (using pattern mode would require the \ and ? to be escaped).

The attached filter converts a Unicode UTF16 .tab file to a format suitable for a TextPipe search/replace list
to be applied to an RTF file with Unicode embedded characters.
Attachments
convert to rtf unicode.zip
(966 Bytes) Downloaded 720 times
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

How exactly i can save a exel list to .tab UTF16?

I tried and used the filter test area which converted the text as you showed and then but it in exel file and save it in tab then.

But i tried to run it through the .rtf file and no results still.

I have attached my full list and text files for example. Maybe you can have a look and test it out how exactly it would work.

Also i found that when i use my list and there is replace string like {{Wiki|Greek}} it would not give the right result in the output .rtf file instead gives only Wiki|Greek.

I tried and saved my text in simple .txt file in UTF-8 encoding and there it replaced it as should. So maybe i can use the list somehow as well in just .txt file. For me it wont matter is it .rtf or something else as long the job is done.

It would be much appreciated when you could have a look and see if you can make it work and tell me exactly what should i do.

Because we are building encyclopedia and it's very useful when text is full of links and people can read further.

Unfortunately we dont have money to hire programmers to write some script or program, we are just two poor Buddhist who sit behind computer day and night trying to create a big encyclopedia so every one could use it.
buddhist108
Posts: 9
Joined: Thu Jul 18, 2013 12:18 am

Re: Non standard characters in Search/Replace List

Post by buddhist108 »

Sorry i forgot to add files
myfiles.part2.rar
(181.98 KiB) Downloaded 883 times
myfiles.part1.rar
(200 KiB) Downloaded 939 times
Post Reply