Page 1 of 1
split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 5:42 am
by ramin2000
Hi gurus!
I have a very large pipe file which is sorted according to the first column.
I need to split this large file in to many files according to only first 3 bytes of the first column.
For example:
All references starting with AAA Should split to text file AAA.TXT
All references starting with AAB Should split to text file AAB.TXT
And so on.
Please keep in mind that I am a Newbie...
Any ideas?
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 7:19 am
by DataMystic Support
How is this different to your other questions about splitting files by rug type?
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 10:29 am
by ramin2000
I was successful in getting it done using the other way the problem is it takes 10 days of computer work to do extraction.
My hope is that this way by using split instead I could cut down the time to few hours instead.
It would be great to have this function of split by column built in to the program at some point, but in the mean time I still need help!
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 11:04 am
by DataMystic Support
Here are the basics of detecting a column change to column 1.
Code: Select all
'detect changes - write to new file
dim name
dim oldfield
'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
field = mid(line,1,instr(line,","))
if oldfield <> field then
processLine = "--- new file ---" & EOL & line & " " & a & EOL
else
processLine = field & "*" & line & " " & a & EOL
end if
oldfield = field
end function
sub startJob()
end sub
sub endJob()
end sub
function startFile()
startFile = ""
oldfield = ""
end function
function endFile()
endFile = ""
end function
And here is the code to write this data to a file:
Code: Select all
'detect changes - write to new file
dim name
dim oldfield
dim fso
dim TextStream
'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
field = mid(line,1,instr(line,",") - 1)
if oldfield <> field then
if TextStream <> Null then TextStream.close
Set TextStream = fso.OpenTextFile( "C:\" & field & ".txt", 8, True)
TextStream.writeLine( line )
else
TextStream.writeLine( line )
end if
oldfield = field
processLine = ""
end function
sub startJob()
Set fso = CreateObject("Scripting.FileSystemObject")
end sub
sub endJob()
Set fso = Nothing
end sub
function startFile()
startFile = ""
oldfield = ""
end function
function endFile()
if TextStream <> Null then TextStream.close
endFile = ""
end function
This doesn't quite work for me - not sure why. Thoughts anyone?
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 11:40 am
by ramin2000
I may not know what I am talking about here…but…
How about going at it in 2 steps.
1. Marking the split points by adding a divider line
Something like:
Mashad
Mashad
Mashad
=====Mas=====
Bidjar
Bidjar
Bidjar
=====Bid=====
2. Splitting the files to:
Mas.txt
Bid.txt
Maybe the above can be done without JavaScript(which I know nothing about) hehehehe
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 11:52 am
by DataMystic Support
Adding the split points is easy - setting the filename to the splitted data is not.
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 12:03 pm
by ramin2000
Lets split the files without the names first…
And then once we have the split files we can rename the file names according to the content of one of the columns in the split files
(in my case, I will include another column in the files with the Mas and Bid and so on info for the above file name replace filter to look at.
That would make it in to 3 steps
1. Mark a divider at split points.
2. Split the files according to the divider
3. Rename file names according to deferent column input(of the new files)
Could this be done?
Re: split according to only first 3 bytes of the first column
Posted: Tue Jun 22, 2010 4:44 pm
by DataMystic Support
Attached is a script that does not use vbscript. It identifies the split position, and splits the files.