correcting the alignment of table data

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Rob Nevin
Posts: 3
Joined: Mon Sep 29, 2003 4:34 am
Location: Ontario, Canada

correcting the alignment of table data

Post by Rob Nevin »

I am trying to improve the alignment of data first sourced in an HTML table. TextPipe was great about stripping off the HTML tags and has left me with the data I'm interested in but I would like to see the data line up in columns or tabbed properly.

My source looks like this:

Total USDA - Subsidies
Subtotal, Farming Subsidies
Production Flexibility Contracts
Production Flexibility - Corn
Production Flexibility - Wheat

The table above didn't illustrate properly so I substituted "......" in the illustration above to represent the six spaces actually occuring.

I would like to align the columns of numbers either on the $ sign, or on the decimal.

Any assistance would be appreciated!
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia

Post by DataMystic Support »

Hi Rob,

I'd replace each set of 6 spaces with a tab, or import it into Excel.
Rob Nevin
Posts: 3
Joined: Mon Sep 29, 2003 4:34 am
Location: Ontario, Canada

Round two - isolating columns in a table

Post by Rob Nevin »

Hey Simon,

Thanks for the reply.

1. The number of records I have to process exceeds Excel's capacity.
2. I tried a) replacing the six spaces between the numbers with tabs but the alignment of the columns was still incorrect. The size of the numbers (ranging from "$0" to $9,999,999) was greater than the width of the tab stop interval (which I believe is eight).

Is there a way to recognize a multiple number patterns? (ie "Dollar sign and a single number"; "Dollar sign with two number"; "Dollar sign with three numbers" "Dollar sign with one number a comma and three numbers" etc? If so, I could "capture" each possiblity and handle it appropriately with multiple tabs (as would be required for $0) or padding it with extra spaces.

Alternatively, if there is a way to isolate $ followed by numbers (with commas or not) AND a CR/LF I could recurse through the table first replacing the LAST record in the row (any ocurrance of $numberCRLF) with (CR/LF fieldname-2001[content]). THEN do it again (on the same row) for each of the 7 columns changing the "field name" for each successive pass. THEN .. proceed to the next row and continue. This would give me the abilty to deal with each column as content in a field name of it's own. *shrug* :cry:

The challenge for me is coming up with the pattern that would recognize the $0; $000; $0,000; $00,000; $000,000; $0,000,000; type combinations.

I hope this makes (more) sense.
If not, please let me know and I'll provide better examples or files.


Post by Guest »

Hi Rob,

Try this pattern


This will match dollar sign followed by the maximum number of digits and commas.

Dollar sign and single \$\d
\$\d{3} ....
Rob Nevin
Posts: 3
Joined: Mon Sep 29, 2003 4:34 am
Location: Ontario, Canada

Post by Rob Nevin »

To "Guest":

Thanks for the feedback. The following are the results:

The pattern \$[\d,]+? did great job of finding all the combinations of the dollars ($1; $12; $123; $1,234; $12,345; $123,456).

The pattern \$\d{2} (etc) works great as long as I include a "space" character in the pattern. This is fine for all but the last set of numbers as they don't have a trailing space (rather a CR/LF). If I don't include a space it recognizes the first "n" characters in the number set. For example \$\d{3} without a space will recognize the following:

$1; $12; $123; $1,234; $12,345; $123,456

This presents a problem for the last set of numbers in the row as it does NOT have a trailing space. It is followed by a CR/LF.

I changed the pattern to accept either a trailing space or CR LF characters by adding a Sub Pattern as follows:

\$[\d,]+?( |\r\n)

Likewise for the other suggestion provided:

\$\d{3}( |\r\n)

The pattern had to be changed to accomodate the comma for numbers greater than 999.

\$\d,\d{3}( |\r\n)

THANK YOU (so much) for the help. It put me well down the path of the solution.

I'm grateful.
Post Reply