From https://www.mobileread.com/forums/showthread.php?s=0158626d388cfb56d4726c23d8691804&p=3548394#post3548394
Using any pdf to htm converter and Textpipe Pro.
Textpipe pro is a pattern based text processing tool and doesn't matter how lame the conversion is, you can bring your text to a desired look and style and format. For pdf to html, sticking to what you know is the easiest and the best way and I usually use mobipocket creator for conversion, and Textpipe for reformatting/styling and mobipocket/calibre for producing ebooks.
If you want complete control over your ebook's style, or picky about the quality, or hate reading poorly formatted text, or always enjoy ebooks with TOC, or want to clean up messy html produced by Word processors like Ms Word (watch a screencast), or always insist on clean html format before conversion, Textpipe is the right tool for you.
Textpipe pro can do pattern based search and replace along with other jobs, and the options with it is endless but here is a brief list of things that you can do with Text pipe in terms of ebook reformating/styling.
1. You can add/remove all html tags/classes/attributes all at once with or without their text. For example: if you have a converted html text like <p style="..."> or <p class="..." style="...">, find and replace will never work for you and you have to clean up manually. But with Textpipe, it will only take seconds. Also you can remove desired class with its text completely, such as <p class="myclass"> myText </p>.
2. You can remove specific html tags/classes/attributes while keeping others. For example, you may want to remove all attributes except for italic and bold.
3. You can remove remove page numbers or titles all at once.
4. You can convert certain tags into another tags ie. h2 >> p
5. Change case after restricting the text, like changing case of text that lies in certain tag or certain class.
6. Since some ebook readers do not support small caps, you can mimic small caps as S<small>MALL</small>. First, you can restrict your text based on pattern, like being between certain tag or class. Then you can add <small> to not the first but remaining letters of the words. Sounds complicated but it is really very easy with subfilters and takes seconds.
7. Joining ruptured paragraphs/sentences
8. Remove extra spaces and tabs
9. Shifting and swapping text
10. Splitting and joining multiple htmls
11. Changing text encoding system (ansi, unicode, utf-8)
12. Adding/removing italics or bolds.
Learning curve may look a bit steep but it is not. Just take a look and play around.
Epub/Mobi text preparation tips
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Epub/Mobi text preparation tips
TextPipe Pro edition is only really required for mainframe filters.
TextPipe Standard suffices for most users.
David
TextPipe Standard suffices for most users.
David
David