split according to only first 3 bytes of the first column

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
ramin2000
Posts: 11
Joined: Wed Jun 02, 2010 12:02 am

split according to only first 3 bytes of the first column

Post by ramin2000 »

Hi gurus!
I have a very large pipe file which is sorted according to the first column.
I need to split this large file in to many files according to only first 3 bytes of the first column.

For example:
All references starting with AAA Should split to text file AAA.TXT
All references starting with AAB Should split to text file AAB.TXT
And so on.

Please keep in mind that I am a Newbie...
Any ideas?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: split according to only first 3 bytes of the first column

Post by DataMystic Support »

How is this different to your other questions about splitting files by rug type?
ramin2000
Posts: 11
Joined: Wed Jun 02, 2010 12:02 am

Re: split according to only first 3 bytes of the first column

Post by ramin2000 »

I was successful in getting it done using the other way the problem is it takes 10 days of computer work to do extraction.
My hope is that this way by using split instead I could cut down the time to few hours instead.
It would be great to have this function of split by column built in to the program at some point, but in the mean time I still need help!
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: split according to only first 3 bytes of the first column

Post by DataMystic Support »

Here are the basics of detecting a column change to column 1.

Code: Select all

'detect changes - write to new file

dim name
dim oldfield

'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
  field = mid(line,1,instr(line,","))

  if oldfield <> field then
    processLine = "--- new file ---" & EOL & line & " " & a & EOL
  else
    processLine = field & "*" & line & " " & a & EOL
  end if
  
  oldfield = field
  
end function


sub startJob()
end sub


sub endJob()
end sub


function startFile()
  startFile = ""
  oldfield = ""
end function


function endFile()
  endFile = ""
end function

And here is the code to write this data to a file:

Code: Select all

'detect changes - write to new file

dim name
dim oldfield
dim fso
dim TextStream

'Called for every line in the file
'EOL contains the end of line characters (Unix, DOS or Mac) that must be
'appended to each line
function processLine(line, EOL)
  field = mid(line,1,instr(line,",") - 1)

  if oldfield <> field then
    if TextStream <> Null then TextStream.close
    Set TextStream = fso.OpenTextFile( "C:\" & field & ".txt", 8, True)
    TextStream.writeLine( line )
  else
    TextStream.writeLine( line )
  end if
  
  oldfield = field
  processLine = ""
  
end function


sub startJob()
  Set fso = CreateObject("Scripting.FileSystemObject")
end sub


sub endJob()
  Set fso = Nothing
end sub


function startFile()
  startFile = ""
  oldfield = ""
end function


function endFile()
  if TextStream <> Null then TextStream.close
  endFile = ""
end function
This doesn't quite work for me - not sure why. Thoughts anyone?
ramin2000
Posts: 11
Joined: Wed Jun 02, 2010 12:02 am

Re: split according to only first 3 bytes of the first column

Post by ramin2000 »

I may not know what I am talking about here…but…
How about going at it in 2 steps.
1. Marking the split points by adding a divider line

Something like:

Mashad
Mashad
Mashad
=====Mas=====
Bidjar
Bidjar
Bidjar
=====Bid=====

2. Splitting the files to:
Mas.txt
Bid.txt

Maybe the above can be done without JavaScript(which I know nothing about) hehehehe
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: split according to only first 3 bytes of the first column

Post by DataMystic Support »

Adding the split points is easy - setting the filename to the splitted data is not.
ramin2000
Posts: 11
Joined: Wed Jun 02, 2010 12:02 am

Re: split according to only first 3 bytes of the first column

Post by ramin2000 »

Lets split the files without the names first…
And then once we have the split files we can rename the file names according to the content of one of the columns in the split files
(in my case, I will include another column in the files with the Mas and Bid and so on info for the above file name replace filter to look at.


That would make it in to 3 steps
1. Mark a divider at split points.
2. Split the files according to the divider
3. Rename file names according to deferent column input(of the new files)


Could this be done?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: split according to only first 3 bytes of the first column

Post by DataMystic Support »

Attached is a script that does not use vbscript. It identifies the split position, and splits the files.
Attachments
detect changes to a field-no vbscript.zip
(973 Bytes) Downloaded 418 times
Post Reply