Non standard characters in Search/Replace List
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Non standard characters in Search/Replace List
Hi
Im using Filter>Replace>Search/Replace List
I have a big list thousands of keyword (exel file) and also including word with non standard characters like Adhiṭṭhāna , Aññāta-Kondañña etc.
But the problems is it gives ? marks instead of these non standards letter
Is there a possibility so it would replace them as they are
Example Adhiṭṭhāna should be replaced with [[Adhiṭṭhāna]]
Im doing project Chinese Buddhist Encyclopedia http://www.chinabuddhismencyclopedia.com/en where is thousand of articles and i use it to create internal links. I copy the article to rtf document and then let TextPipe to go through it and make the replaces.
It would be very appreciated when somebody could help me out because these words are very needed as links.
Can somebody give me instructions how can i create additional filters or functions do to that.
Im using Filter>Replace>Search/Replace List
I have a big list thousands of keyword (exel file) and also including word with non standard characters like Adhiṭṭhāna , Aññāta-Kondañña etc.
But the problems is it gives ? marks instead of these non standards letter
Is there a possibility so it would replace them as they are
Example Adhiṭṭhāna should be replaced with [[Adhiṭṭhāna]]
Im doing project Chinese Buddhist Encyclopedia http://www.chinabuddhismencyclopedia.com/en where is thousand of articles and i use it to create internal links. I copy the article to rtf document and then let TextPipe to go through it and make the replaces.
It would be very appreciated when somebody could help me out because these words are very needed as links.
Can somebody give me instructions how can i create additional filters or functions do to that.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Non standard characters in Search/Replace List
With your search/replace list, are you specifying the Unicode search/replace type? And are your files UTF-8 or UTF-16LE? (they need to be UTF-16LE).
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
Sorry but im quite new with all this.
No im only using one filter and thats Search/Replace List.
How can i specify the Unicode search/replace type for these characters.
And how can i make my .rtf text file into UTF-16LE.
Thank you!!!
No im only using one filter and thats Search/Replace List.
How can i specify the Unicode search/replace type for these characters.
And how can i make my .rtf text file into UTF-16LE.
Thank you!!!
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Non standard characters in Search/Replace List
For Unicode in the RTF file (from http://en.wikipedia.org/wiki/Rich_Text_Format):
To convert your existing list to the \uHHHH? form, use the attached filter. It converts a CSV replacement list to the \uHHHH? format that will work.
Don't change the search type (which is a setting of the search/replace filter), but instead, specify your search/replace pairs using \uHHHH? syntax.For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.
To convert your existing list to the \uHHHH? form, use the attached filter. It converts a CSV replacement list to the \uHHHH? format that will work.
- Attachments
-
- convert to rtf unicode.zip
- (811 Bytes) Downloaded 915 times
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
Hi
Does it means i have to manually replace all the nonstandard letter to their /u values in the .rtf file.
Or is there some way so i can do it automatically because i have thousands of articles which i need to go through.
Also when i run the filter you sent on my list it gives me PKsŠòBKjõ§¤º[Content_Types].xmlíXË’œ0¼§*ÿb,[‚cÀæ¸ñÿß©eÃ>RIv¶fköBQB´ZݲÇ̺ such outcome.
I attached my list maybe you can have a look.
Does it means i have to manually replace all the nonstandard letter to their /u values in the .rtf file.
Or is there some way so i can do it automatically because i have thousands of articles which i need to go through.
Also when i run the filter you sent on my list it gives me PKsŠòBKjõ§¤º[Content_Types].xmlíXË’œ0¼§*ÿb,[‚cÀæ¸ñÿß©eÃ>RIv¶fköBQB´ZݲÇ̺ such outcome.
I attached my list maybe you can have a look.
Re: Non standard characters in Search/Replace List
please tell me about format output what you want .
in format ".rtf" or ".txt", because output 2 format is different .
in format ".rtf" or ".txt", because output 2 format is different .
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
It dosent matter much which output probably .rtf because simple .txt dosent hold the non standard characters.
The main issue is to get the textpipe to replace the word with nonstandard characters as well.
Thanks
The main issue is to get the textpipe to replace the word with nonstandard characters as well.
Thanks
Re: Non standard characters in Search/Replace List
sorry for miss main issue
you can use filter Replace -> Search/replace list -> insert path replacelist,
you can use *.tab file with UTF-8 to replace
(1st line must blank becuase 1st line have Byte Order Marker (BOM) )
you can use filter Replace -> Search/replace list -> insert path replacelist,
you can use *.tab file with UTF-8 to replace
(1st line must blank becuase 1st line have Byte Order Marker (BOM) )
Last edited by tumtum on Sat Jul 20, 2013 3:36 pm, edited 1 time in total.
Re: Non standard characters in Search/Replace List
Example replace list
- Attachments
-
- replacelist.zip
- (3.11 KiB) Downloaded 972 times
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
I tried to run the .tab list you sent but there is still no affects on the non standard characters in output file.
Output file (.rtf) text : words with [[ ]] are what TextPipe replaced
------------
Abhayamātā is the most recent system of development of tantra in gelug tradition and Abhayanagara is nyingma [[yogacara]] Abhayanāga.
Brahmavihāras are the most [[Buddha]] Śākyamuni mañjuśrīkumārabhūta and mañjuśrīmitra in gohō dōji.
------------------------
Words what are marked bold are included in the list but it wont replace them with [[ ]] on each side as needed.
Is there something else i should be aware of?
Output file (.rtf) text : words with [[ ]] are what TextPipe replaced
------------
Abhayamātā is the most recent system of development of tantra in gelug tradition and Abhayanagara is nyingma [[yogacara]] Abhayanāga.
Brahmavihāras are the most [[Buddha]] Śākyamuni mañjuśrīkumārabhūta and mañjuśrīmitra in gohō dōji.
------------------------
Words what are marked bold are included in the list but it wont replace them with [[ ]] on each side as needed.
Is there something else i should be aware of?
Re: Non standard characters in Search/Replace List
You should to known input "text" from ".rtf" is not contains specials character, if you want to know what is "special character" in file ".rtf" you should open file by "notepad" or text editor you can seeIs there something else i should be aware of?
This input ".rtf" is not have any sprcial character, rigth ? .{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\fswiss\fprq2\fcharset186 Calibri Baltic;}{\f1\fswiss\fprq2\fcharset0 Calibri;}{\f2\fswiss\fprq2\fcharset186 Calibri;}{\f3\fswiss\fprq2\fcharset238 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\nowidctlpar\sa200\sl276\slmult1\lang9\f0\fs22 Abhayam\'e2t\'e2\lang1033\f1 is the most recent system of development of tantra in gelug tradition and \lang9 Abhayanagara\lang1033 is nyingma yogacara \lang9\f0 Abhayan\'e2ga\f1 .\par
\par
Brahmavih\f2\'e2ras\lang1033\f1 are the most Buddha \f3\'8c\u257?kyamuni\f1 ma\'f1ju\f3\'9cr\u299?kum\u257?rabh\u363?ta\f1 and ma\'f1ju\f3\'9cr\u299?mitra\f1 in goh\f2\'f4 d\'f4ji\f1 \u238? \f3\'8c \u257? .\lang9\par
\par
}
that cause i ask you in first comment .
because if you want replace only special character in your topicwhat is output format what you want ?
, you can use my replace list to replace by textpipe can do it very easy .Non standard characters in Search/Replace List
but your input (".rtf") doesn't contain any special character,
that cause my replace list (".tab") file i send is not find any special character
First thing you should to aware is what character input you want to find and replace .
Second thing you should to aware is "Sequence of replcase list" ,
because file replace list is read and replace line by line
example replacelist
book red book
this is a book this is my book.
first find and replace is "book -> red book" .
Although your input text is "this is a book" , cannot replace to "this is my book."
Because first find and replace is "book", you will got output
this is a red book .
Regard
Panupong Sanprasit
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
Ok i put my question in this way.
I have a text
At the moment i can replace multiple word like Buddha Sakyamuni to [[Buddha Sakyamuni]] and even find strings containing more than three words. Using search/replace list with function Simultaneous search
But the problem i have it wont replace find strings(consists of multiple words) which contains letter like ñ, ā, ō, ī, ś etc.
I have a list which contains aprox 12000 find strings.
And i have to go through thousands of articles which lenght from couple of hundred up to 4000 words or more.
I copy the article from web page and insert it to .rtf document for TextPipe to process.
Can somebody tell me how to compose a filter to able to perform this process or is it impossible?
I have a text
and the out come have to beAbhayamātā is the most recent system of development of tantra in gelug tradition and Abhayanagara is nyingma [[yogacara]] Abhayanāga.
Brahmavihāras are the most Buddha Śākyamuni mañjuśrīkumārabhūta and mañjuśrīmitra in gohō dōji.
Is this possible or not?[[Abhayamātā]] is the most recent system of development of [[tantra]] in [[gelug]] tradition and [[Abhayanagara]] is [[nyingma]] [[yogacara]] [[Abhayanāga]].
[[Brahmavihāras]] are the most [[Buddha Śākyamuni]] [[mañjuśrīkumārabhūta]] and [[mañjuśrīmitra]] in [[gohō dōji]].
At the moment i can replace multiple word like Buddha Sakyamuni to [[Buddha Sakyamuni]] and even find strings containing more than three words. Using search/replace list with function Simultaneous search
But the problem i have it wont replace find strings(consists of multiple words) which contains letter like ñ, ā, ō, ī, ś etc.
I have a list which contains aprox 12000 find strings.
And i have to go through thousands of articles which lenght from couple of hundred up to 4000 words or more.
I copy the article from web page and insert it to .rtf document for TextPipe to process.
Can somebody tell me how to compose a filter to able to perform this process or is it impossible?
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Non standard characters in Search/Replace List
It will work but you need to convert your replace list to the right format first.
Since RTF does not store Unicode, but instead stores unicode chars as \Uhhhh?, this adds extra complication to the steps.
1. Instead of Excel .xls, use a .TAB format file saved in UTF16 (which you can still edit in Excel).
2. So the input files goes from:
to:
3. Now you can search an RTF file using this search/replace list, in Exact mode (using pattern mode would require the \ and ? to be escaped).
The attached filter converts a Unicode UTF16 .tab file to a format suitable for a TextPipe search/replace list
to be applied to an RTF file with Unicode embedded characters.
Since RTF does not store Unicode, but instead stores unicode chars as \Uhhhh?, this adds extra complication to the steps.
1. Instead of Excel .xls, use a .TAB format file saved in UTF16 (which you can still edit in Excel).
2. So the input files goes from:
Code: Select all
avīci [[avīci]]
Code: Select all
av\U2B01?ci [[av\U2B01?ci]]
The attached filter converts a Unicode UTF16 .tab file to a format suitable for a TextPipe search/replace list
to be applied to an RTF file with Unicode embedded characters.
- Attachments
-
- convert to rtf unicode.zip
- (966 Bytes) Downloaded 897 times
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
How exactly i can save a exel list to .tab UTF16?
I tried and used the filter test area which converted the text as you showed and then but it in exel file and save it in tab then.
But i tried to run it through the .rtf file and no results still.
I have attached my full list and text files for example. Maybe you can have a look and test it out how exactly it would work.
Also i found that when i use my list and there is replace string like {{Wiki|Greek}} it would not give the right result in the output .rtf file instead gives only Wiki|Greek.
I tried and saved my text in simple .txt file in UTF-8 encoding and there it replaced it as should. So maybe i can use the list somehow as well in just .txt file. For me it wont matter is it .rtf or something else as long the job is done.
It would be much appreciated when you could have a look and see if you can make it work and tell me exactly what should i do.
Because we are building encyclopedia and it's very useful when text is full of links and people can read further.
Unfortunately we dont have money to hire programmers to write some script or program, we are just two poor Buddhist who sit behind computer day and night trying to create a big encyclopedia so every one could use it.
I tried and used the filter test area which converted the text as you showed and then but it in exel file and save it in tab then.
But i tried to run it through the .rtf file and no results still.
I have attached my full list and text files for example. Maybe you can have a look and test it out how exactly it would work.
Also i found that when i use my list and there is replace string like {{Wiki|Greek}} it would not give the right result in the output .rtf file instead gives only Wiki|Greek.
I tried and saved my text in simple .txt file in UTF-8 encoding and there it replaced it as should. So maybe i can use the list somehow as well in just .txt file. For me it wont matter is it .rtf or something else as long the job is done.
It would be much appreciated when you could have a look and see if you can make it work and tell me exactly what should i do.
Because we are building encyclopedia and it's very useful when text is full of links and people can read further.
Unfortunately we dont have money to hire programmers to write some script or program, we are just two poor Buddhist who sit behind computer day and night trying to create a big encyclopedia so every one could use it.
-
- Posts: 9
- Joined: Thu Jul 18, 2013 12:18 am
Re: Non standard characters in Search/Replace List
Sorry i forgot to add files