How to Remove Lines if Identical up to Second Comma?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
jorgejulio
Posts: 2
Joined: Tue Jan 10, 2006 6:28 am

How to Remove Lines if Identical up to Second Comma?

Post by jorgejulio »

I have perl code which works for removing lines in a .CSV(comma-separated values) file identical up to first comma(i.e. when first "key" is identical).

EXAMPLE INPUT (The line numbers are only for identification, they wouldn't be part of the input):
1: 123,abc,XYZ
2: 123,def,UVW
3: 456,abc,XYZ
4: 456,def,UVW
5: 123,abc,QRS
6: 789,abc,XYZ

OUTPUT:
1: 123,abc,XYZ
3: 456,abc,XYZ
6: 789,abc,XYZ

open (FILE,"mycsv.csv");
foreach $line (<FILE>)
{
($first,$second)=split(/,/,$line);
if (!$file{$first})
{
push (@newfile,$line);
$file{$first} = 1;
}
}
print @newfile;
close FILE;

How does one skip line if it is identical up to second comma (if first two keys are identical)?

"TASK 2" EXAMPLE INPUT (The line numbers are only for identification, they wouldn't be part of the input):
1: 123,abc,XYZ
2: 123,def,UVW
3: 456,abc,XYZ
4: 456,def,UVW
5: 123,abc,QRS
6: 789,abc,XYZ

NEEDED OUTPUT:
1: 123,abc,XYZ
2: 123,def,UVW
3: 456,abc,XYZ
4: 456,def,UVW
6: 789,abc,XYZ

Only line 5 should be skipped because "123,abc"(first two keys) were identical in line 1.

Thanks for any help!
j2
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Use a CSV field restriction to pad the first field out to the maximum width (say 30 characters), then use a Remove Uplicate lines, comparing from character 1 to 30.
Post Reply