I've just paid for TextPipe Pro, despite the lack of response to either pre-sales support questions and "in-software chat," and despite the fact that both telephone numbers (US toll-free and Australian direct-dial) are not answered with any company name. It is clearly very useful software, even if I never get support from DataMystic.
Here's the question I didn't get answered: What is the best way to convert UPPER CASE TEXT to "Proper" or "Title" case text? I am working with a large number of merchant datafeeds, some of which contain some product titles in ALL CAPS, and some of which contain titles that have indidual words in CAPS. I want to "fix what's broken, and not fix what's not broken." So what I want is:
(1) If the title contains only UPPER CASE letters, then convert the entire title to Proper case.
(2) If the title contains some lower-case letters, but contains any ALL-CAPS words longer than __ characters, convert that word to Proper case. (I'm not sure whether to say 4 or 5 characters.)
It's important to me that I not attempt to convert listings already in "reasonably OK" case; if I just convert everything to Proper Case, then I'd need more filters to fix capitalization of articles (e.g. "Gone With The Wind").
Examples:
"TOP GUN MOVIE POSTER" --> "Top Gun Movie Poster"
"REEBOK Air Jordan Basketball Shoes" --> "Reebok Air Jordan Basketball Shoes"
"Gone with the Wind MOVIE POSTER" --> "Gone With the Wind Movie Poster" (not "Gone With The Wind Movie Poster")
"Chrommatic ARG 368D digital camera" --> unchanged
"OSHA Compliance Manual" --> unchanged
Thanks for any help.
Best way to replace upper-case with title/proper case
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
Re: Best way to replace upper-case with title/proper case
Clarification: capitalization of articles is not the only reason I don't want to convert everything to Proper case; there are thousands of acronyms and abbreviations and designations that should remain all-upper-case or "special case." Most of these are 2 to 4 characters long (GB, MHz, dB, mA, OSHA, IBM, FBI), but of course some are longer (MRMIP, HICAP). I'm trying to find a "happy medium" that will improve the overall quality of the data.
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Best way to replace upper-case with title/proper case
Hi Mark,
We have holidays here in Australia too, and the Aus phone number (at least) has DataMystic branding, the US phone number probably has LeapFrog branding.
Anyway, the best approach is to use a perl search/replace as a restriction, with text of
[A-Z ]{5,}
and replacement text of
$0
Ensure Match Case is checked.
Add a Convert to TItle Case filter as a subfilter.
We have holidays here in Australia too, and the Aus phone number (at least) has DataMystic branding, the US phone number probably has LeapFrog branding.
Anyway, the best approach is to use a perl search/replace as a restriction, with text of
[A-Z ]{5,}
and replacement text of
$0
Ensure Match Case is checked.
Add a Convert to TItle Case filter as a subfilter.