Data Mining General Question

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
randersoniii
Posts: 4
Joined: Thu May 12, 2005 5:34 am

Data Mining General Question

Post by randersoniii »

Anyone have some general guidance on data mining of reports in which the data can have fields missing? For example, I need to mine a straight ASCII report of addresses laid out like

Person Name
Title
Company Name
Street1
Street2
City, State Zip

where the title, company name, and street2 fields are sometimes missing. The goal is to end up with a CSV with blanks in any missing fields.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

See the data mining white paper at:

http://www.datamystic.com/docs
randersoniii
Posts: 4
Joined: Thu May 12, 2005 5:34 am

Post by randersoniii »

Actually, I have read both the data mining and report mining white papers. Neither, however, seems to talk about this issue of missing data fields in some records.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Ok.

Let's say you end up with lines like this:

##LINE1 ..data..
##LINE2 ..data..
##LINE3 ..data..

where ##LINE2 is optional.

Just use an EasyPattern like this

Code: Select all

[ '##LINE1', capture( 1+ linechar ), cr, lf,
  optional('##LINE2', capture( 1+ linechar ), cr, lf, )
  '##LINE3', capture( 1+ linechar ), cr, lf ]
Replace with:

Code: Select all

"$1","$2","$3"
Does that make sense?
Post Reply