Page 1 of 1

Data Mining General Question

Posted: Thu May 12, 2005 10:12 am
by randersoniii
Anyone have some general guidance on data mining of reports in which the data can have fields missing? For example, I need to mine a straight ASCII report of addresses laid out like

Person Name
Title
Company Name
Street1
Street2
City, State Zip

where the title, company name, and street2 fields are sometimes missing. The goal is to end up with a CSV with blanks in any missing fields.

Posted: Thu May 12, 2005 11:24 pm
by DataMystic Support
See the data mining white paper at:

http://www.datamystic.com/docs

Posted: Fri May 13, 2005 2:08 am
by randersoniii
Actually, I have read both the data mining and report mining white papers. Neither, however, seems to talk about this issue of missing data fields in some records.

Posted: Fri May 13, 2005 5:40 pm
by DataMystic Support
Ok.

Let's say you end up with lines like this:

##LINE1 ..data..
##LINE2 ..data..
##LINE3 ..data..

where ##LINE2 is optional.

Just use an EasyPattern like this

Code: Select all

[ '##LINE1', capture( 1+ linechar ), cr, lf,
  optional('##LINE2', capture( 1+ linechar ), cr, lf, )
  '##LINE3', capture( 1+ linechar ), cr, lf ]
Replace with:

Code: Select all

"$1","$2","$3"
Does that make sense?