Page 1 of 1
Remove partial duplicate lines from list
Posted: Wed May 27, 2015 8:45 am
by alaltaieri
Is that possible to remove partial duplicate lines from a list? So, let's say I have in a file some sentences(one per line):
"Hi there. I want to go to school because I like it"
"Hi there. I want to go to school because I like it a lot"
"Hi there. I want to go to school because I like it a lot and I don't care about what others are saying"
I want after placing the filter to remain just with line 1. Lines 2 and 3 should be deleted because are starting with the same sentence + some extra words sentences"
Can I do this with TextPipe Pro?
Thanks
Re: Remove partial duplicate lines from list
Posted: Wed May 27, 2015 3:10 pm
by nikolas1612
Have just solved a similar task.
viewtopic.php?f=17&t=2195
The only difference is that I needed to find all unique abbreviations while you need to find all unique 20 (for instance) symbols in the beginning of every line.
So try this attachment. It's an attempt to adapt my filter to your demand. Look for explanation in filter's comments.
Re: Remove partial duplicate lines from list
Posted: Thu May 28, 2015 5:54 am
by alaltaieri
Thank you.I will try it and get back to you.
Later edit: It seems I have some problems with the file donwloaded. It's not recognized by text pipe. Can you please activate your PM settings from your profile so I can send you a PM?
thanks
Re: Remove partial duplicate lines from list
Posted: Thu May 28, 2015 6:11 am
by nikolas1612
My PM is activated already
by the way did you unpack the rar before loading it to TP?
Re: Remove partial duplicate lines from list
Posted: Thu May 28, 2015 6:53 am
by alaltaieri
Well it seems I cannot PM you.
Anyway, I have an old version of textpipe pro. And when I load the filter I get this error:
http://prntscr.com/7a4xzt. After pressing ok this error:
http://prntscr.com/7a4y85 and is crashing
http://prntscr.com/7a4yig
Is there anyway to save the filer as for an old version of Textpipe Pro 9.1?:D
Re: Remove partial duplicate lines from list
Posted: Thu May 28, 2015 2:03 pm
by nikolas1612
No. You may download trial TP pro 9.9 and everything will work fine. Then you can process your text either by trial version or just use it to look inside the filter attached. I may additionally encourage you that your task seems to be solved there.
Re: Remove partial duplicate lines from list
Posted: Thu Jun 04, 2015 8:54 am
by alaltaieri
I installed 9.9 trial and I'm struggling in the last 3 days to make it work but I didn't have success.
i tried every combination in my mind and following the logic but I couldn't done the filter properly.
if you are kind enough and you have some spare time can you please look to the file I want to remove partial duplicates from? It will be a great help.
Thank you, appreciated.
https://www.sendspace.com/file/sb8v4t
Re: Remove partial duplicate lines from list
Posted: Fri Jun 05, 2015 12:28 am
by nikolas1612
I dropped the file at TP window (with the filter already loaded) and pressed f9 to start processing. The latter succeded in about 4 minutes.
The result obtained by the filter is attached. Look inside. Is that what you want?
Everything worked as it was planned. Yet keep in mind - everything depends on what you personally understand under "partial duplicates". The current filter counts a line as "partial duplicate" if it's first 20 symbols are not unique.
https://www.sendspace.com/file/ljaahy
P.S. I've just found a much more fast analogue of my filter inside this program, already in-built
Look for it inside "Remove" block - It's called "Remove duplicate lines". I never looked inside it thinking that it compares just complete lines - but it has an option "length" defining the number of characters to compare (you just set it to 20 and achieve the same result in 5 seconds).
So you may try this filter -----
Re: Remove partial duplicate lines from list
Posted: Fri Jun 05, 2015 10:26 am
by alaltaieri
Damn, I can't thank you enough. You are a great person that you used your time to help me with this problem creating that filter. And the thing the remove lines was just in front of me it's crazy. I used that filter forever but I never knew that "length" is actually an option for partial duplicates.
Thanks again man