Remove partial duplicate lines from list

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
alaltaieri
Posts: 9
Joined: Wed Apr 17, 2013 9:12 am

Remove partial duplicate lines from list

Post by alaltaieri »

Is that possible to remove partial duplicate lines from a list? So, let's say I have in a file some sentences(one per line):


"Hi there. I want to go to school because I like it"
"Hi there. I want to go to school because I like it a lot"
"Hi there. I want to go to school because I like it a lot and I don't care about what others are saying"

I want after placing the filter to remain just with line 1. Lines 2 and 3 should be deleted because are starting with the same sentence + some extra words sentences"

Can I do this with TextPipe Pro?

Thanks
nikolas1612
Posts: 22
Joined: Tue May 12, 2015 3:57 am

Re: Remove partial duplicate lines from list

Post by nikolas1612 »

Have just solved a similar task.
viewtopic.php?f=17&t=2195
The only difference is that I needed to find all unique abbreviations while you need to find all unique 20 (for instance) symbols in the beginning of every line.
So try this attachment. It's an attempt to adapt my filter to your demand. Look for explanation in filter's comments.
Attachments
removepartialduplicates.rar
(1.43 KiB) Downloaded 1056 times
alaltaieri
Posts: 9
Joined: Wed Apr 17, 2013 9:12 am

Re: Remove partial duplicate lines from list

Post by alaltaieri »

Thank you.I will try it and get back to you.

Later edit: It seems I have some problems with the file donwloaded. It's not recognized by text pipe. Can you please activate your PM settings from your profile so I can send you a PM? :D thanks
nikolas1612
Posts: 22
Joined: Tue May 12, 2015 3:57 am

Re: Remove partial duplicate lines from list

Post by nikolas1612 »

My PM is activated already
by the way did you unpack the rar before loading it to TP? ;)
alaltaieri
Posts: 9
Joined: Wed Apr 17, 2013 9:12 am

Re: Remove partial duplicate lines from list

Post by alaltaieri »

Well it seems I cannot PM you.
Anyway, I have an old version of textpipe pro. And when I load the filter I get this error: http://prntscr.com/7a4xzt. After pressing ok this error: http://prntscr.com/7a4y85 and is crashing http://prntscr.com/7a4yig

Is there anyway to save the filer as for an old version of Textpipe Pro 9.1?:D
nikolas1612
Posts: 22
Joined: Tue May 12, 2015 3:57 am

Re: Remove partial duplicate lines from list

Post by nikolas1612 »

No. You may download trial TP pro 9.9 and everything will work fine. Then you can process your text either by trial version or just use it to look inside the filter attached. I may additionally encourage you that your task seems to be solved there.
alaltaieri
Posts: 9
Joined: Wed Apr 17, 2013 9:12 am

Re: Remove partial duplicate lines from list

Post by alaltaieri »

I installed 9.9 trial and I'm struggling in the last 3 days to make it work but I didn't have success.
i tried every combination in my mind and following the logic but I couldn't done the filter properly.
if you are kind enough and you have some spare time can you please look to the file I want to remove partial duplicates from? It will be a great help.
Thank you, appreciated.
https://www.sendspace.com/file/sb8v4t
nikolas1612
Posts: 22
Joined: Tue May 12, 2015 3:57 am

Re: Remove partial duplicate lines from list

Post by nikolas1612 »

I dropped the file at TP window (with the filter already loaded) and pressed f9 to start processing. The latter succeded in about 4 minutes.
The result obtained by the filter is attached. Look inside. Is that what you want?

Everything worked as it was planned. Yet keep in mind - everything depends on what you personally understand under "partial duplicates". The current filter counts a line as "partial duplicate" if it's first 20 symbols are not unique.

https://www.sendspace.com/file/ljaahy

P.S. I've just found a much more fast analogue of my filter inside this program, already in-built ;)
Look for it inside "Remove" block - It's called "Remove duplicate lines". I never looked inside it thinking that it compares just complete lines - but it has an option "length" defining the number of characters to compare (you just set it to 20 and achieve the same result in 5 seconds).
So you may try this filter -----
Attachments
remove-dups.rar
(858 Bytes) Downloaded 1072 times
alaltaieri
Posts: 9
Joined: Wed Apr 17, 2013 9:12 am

Re: Remove partial duplicate lines from list

Post by alaltaieri »

Damn, I can't thank you enough. You are a great person that you used your time to help me with this problem creating that filter. And the thing the remove lines was just in front of me it's crazy. I used that filter forever but I never knew that "length" is actually an option for partial duplicates.

Thanks again man :D :D :D
Post Reply