CSV-like data merging

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
simicar
Posts: 5
Joined: Thu Feb 15, 2007 7:25 am

CSV-like data merging

Post by simicar »

Hi.

I encountered problems by using "extract matching lines" with context lines (1 line - before and after) selected, because I get this:

Code: Select all

;Trial Input;noxious gases away from the users of the machine. ;
;Trial Input;Indoor generators and furnaces can quickly fill an enclosed s;
;Trial Input;pace with carbon monoxide or other poisonous exhaust gases ;
The problem is that I need to convert those three lines to only one row in excel (not three), ex. with the word (*enclosed*) to get:

Code: Select all

;Trial Input;noxious gases away from the users of the machine.
Indoor generators and furnaces can quickly fill an enclosed s
pace with carbon monoxide or other poisonous exhaust gases ;
How can I do this?
I've already tried the "convert end of lines option" or using headres and footers - but when using h/f I get ';' only at the beginning and ending,
but of the extracted file - not rows (3 rows here) as needed.

The above example I've generated by:
  • File input:..
    Extract matching [*enclosed*]
    Replace ; with ,
    Insert column 1 [;@inputFilename;]
    Insert column 0 [;]
    Merge to file...
Please help
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

To join every 3 lines into one, use an EasyPattern like this:

Code: Select all

 [ capture(0+ not cr or lf), cr, lf,
    capture(0+ not cr or lf), cr, lf,
    capture(0+ not cr or lf), cr, lf ]
Replace with

Code: Select all

  $1 $2 $3
simicar
Posts: 5
Joined: Thu Feb 15, 2007 7:25 am

Not quite..

Post by simicar »

Unfortunately when I've used the suggested:
  • File input:..
    Extract matching [*tool*] - (here extraction of the 1 line before and after)
    Replace ; with ,
    Insert column 1 [;@inputFilename;]
    Insert column 0 [;]
    EasyPattern [...as given above...]
    Merge to file...
on the trial input:

Code: Select all

TextPipe provides a single point of maintenance for all your text processing tasks. 
You learn one tool, rather than learning 4 or more - and their associated languages, 
command line options, debugging schemes, idiosyncrasies and operating system differences and dependencies.
I got a result:

Code: Select all

;Trial Input;TextPipe provides a single point of maintenance for all your text processing tasks. ;
;Trial Input;You learn one tool, rather than learning 4 or more - and their associated languages, ;
;Trial Input;command line options, debugging schemes, idiosyncrasies and operating system differences and dependencies.;
instead of:

Code: Select all

;Trial Input;TextPipe provides a single point of maintenance for all your text processing tasks.
You learn one tool, rather than learning 4 or more - and their associated languages,
command line options, debugging schemes, idiosyncrasies and operating system differences and dependencies.;
Maybe the position of Replace -> Find EasyPattern is wrong,
I still can't sort this thing out...
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

If you only want the input filename shown once, then move the EasyPattern up just underneath the Extract filter.
simicar
Posts: 5
Joined: Thu Feb 15, 2007 7:25 am

How about duplicates

Post by simicar »

Thanks. It really works - but only when key word appears once per file. The problem appears, when there are two or more of them.

Ex. from - (extracting word tool):

Code: Select all

TextPipe provides a single...
...one tool, rather than
command line options, debugging schemes,

TextPipe provides a single...
...one tool, rather than
command line options, debugging schemes,

TextPipe provides a single...
...one tool, rather than
command line options, debugging schemes,
I get:

Code: Select all

;Trial Input;TextPipe provides a single... ...one tool, rather than command line options, debugging schemes,TextPipe provides a single... ...one tool, rather than command line options, debugging schemes,TextPipe provides a single...;
;Trial Input;...one tool, rather than;
;Trial Input;command line options, debugging schemes,;
instead of (where the Trial Inupt should be the same file):

Code: Select all

;Trial Input;TextPipe provides a single... ...one tool, rather than command line options, debugging schemes,;
;Trial Input;TextPipe provides a single... ...one tool, rather than command line options, debugging schemes,;
;Trial Input;TextPipe provides a single... ...one tool, rather than command line options, debugging schemes,;
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Try this filter (note - this text comes from File\Export\Export to Clipboard):

Code: Select all

|--Extract lines matching [tool]
|     [ ] Include line numbers
|     [ ] Include filename
|     [ ] Match case
|     [ ] Count matches
|     Pattern type: 0
|     Context before: 1
|     Context after: 1
|   
|--EasyPattern [[ capture(0+ not cr or lf), cr, lf,\r\n    capture(0+ not cr or lf), cr, lf,\r\n    capture(longest 0+ not cr or lf), longest optional( cr, lf)  ]] with [@inputFilename;$1 $2 $3;\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [X] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 4096

Post Reply