Page 1 of 1

split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 5:42 am
by ramin2000
Hi gurus!
I have a very large pipe file which is sorted according to the first column.
I need to split this large file in to many files according to only first 3 bytes of the first column.

For example:
All references starting with AAA Should split to text file AAA.TXT
All references starting with AAB Should split to text file AAB.TXT
And so on.

Please keep in mind that I am a Newbie...
Any ideas?

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 7:19 am
by DataMystic Support
How is this different to your other questions about splitting files by rug type?

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 10:29 am
by ramin2000
I was successful in getting it done using the other way the problem is it takes 10 days of computer work to do extraction.
My hope is that this way by using split instead I could cut down the time to few hours instead.
It would be great to have this function of split by column built in to the program at some point, but in the mean time I still need help!

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 11:04 am
by DataMystic Support
Here are the basics of detecting a column change to column 1.

Code: Select all

'detect changes - write to new file

dim name
dim oldfield

'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
  field = mid(line,1,instr(line,","))

  if oldfield <> field then
    processLine = "--- new file ---" & EOL & line & " " & a & EOL
  else
    processLine = field & "*" & line & " " & a & EOL
  end if
  
  oldfield = field
  
end function


sub startJob()
end sub


sub endJob()
end sub


function startFile()
  startFile = ""
  oldfield = ""
end function


function endFile()
  endFile = ""
end function

And here is the code to write this data to a file:

Code: Select all

'detect changes - write to new file

dim name
dim oldfield
dim fso
dim TextStream

'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
  field = mid(line,1,instr(line,",") - 1)

  if oldfield <> field then
    if TextStream <> Null then TextStream.close
    Set TextStream = fso.OpenTextFile( "C:\" & field & ".txt", 8, True)
    TextStream.writeLine( line )
  else
    TextStream.writeLine( line )
  end if
  
  oldfield = field
  processLine = ""
  
end function


sub startJob()
  Set fso = CreateObject("Scripting.FileSystemObject")
end sub


sub endJob()
  Set fso = Nothing
end sub


function startFile()
  startFile = ""
  oldfield = ""
end function


function endFile()
  if TextStream <> Null then TextStream.close
  endFile = ""
end function
This doesn't quite work for me - not sure why. Thoughts anyone?

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 11:40 am
by ramin2000
I may not know what I am talking about here…but…
How about going at it in 2 steps.
1. Marking the split points by adding a divider line

Something like:

Mashad
Mashad
Mashad
=====Mas=====
Bidjar
Bidjar
Bidjar
=====Bid=====

2. Splitting the files to:
Mas.txt
Bid.txt

Maybe the above can be done without JavaScript(which I know nothing about) hehehehe

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 11:52 am
by DataMystic Support
Adding the split points is easy - setting the filename to the splitted data is not.

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 12:03 pm
by ramin2000
Lets split the files without the names first…
And then once we have the split files we can rename the file names according to the content of one of the columns in the split files
(in my case, I will include another column in the files with the Mas and Bid and so on info for the above file name replace filter to look at.


That would make it in to 3 steps
1. Mark a divider at split points.
2. Split the files according to the divider
3. Rename file names according to deferent column input(of the new files)


Could this be done?

Re: split according to only first 3 bytes of the first column

Posted: Tue Jun 22, 2010 4:44 pm
by DataMystic Support
Attached is a script that does not use vbscript. It identifies the split position, and splits the files.