Page 1 of 1

correcting the alignment of table data

Posted: Mon Sep 29, 2003 4:38 am
by Rob Nevin
I am trying to improve the alignment of data first sourced in an HTML table. TextPipe was great about stripping off the HTML tags and has left me with the data I'm interested in but I would like to see the data line up in columns or tabbed properly.

My source looks like this:

________________________________________
Total USDA - Subsidies
......$0......$0......$107,863......$282,455......$163,739......$234,400......$788,458
Subtotal, Farming Subsidies
......$0......$0......$107,863......$282,455......$163,572......$216,740......$770,631
Production Flexibility Contracts
......$0......$0......$64,717......$62,288......$58,899......$53,909......$239,813
Production Flexibility - Corn
......$0......$0......$64,717......$62,113......$58,731......$53,774......$239,335
Production Flexibility - Wheat
......$0......$0......$0......$175......$168......$135......$478
__________________________________


The table above didn't illustrate properly so I substituted "......" in the illustration above to represent the six spaces actually occuring.

I would like to align the columns of numbers either on the $ sign, or on the decimal.

Any assistance would be appreciated!

Posted: Fri Oct 03, 2003 5:27 am
by DataMystic Support
Hi Rob,

I'd replace each set of 6 spaces with a tab, or import it into Excel.

Round two - isolating columns in a table

Posted: Fri Oct 03, 2003 6:37 am
by Rob Nevin
Hey Simon,

Thanks for the reply.

1. The number of records I have to process exceeds Excel's capacity.
2. I tried a) replacing the six spaces between the numbers with tabs but the alignment of the columns was still incorrect. The size of the numbers (ranging from "$0" to $9,999,999) was greater than the width of the tab stop interval (which I believe is eight).

Is there a way to recognize a multiple number patterns? (ie "Dollar sign and a single number"; "Dollar sign with two number"; "Dollar sign with three numbers" "Dollar sign with one number a comma and three numbers" etc? If so, I could "capture" each possiblity and handle it appropriately with multiple tabs (as would be required for $0) or padding it with extra spaces.

Alternatively, if there is a way to isolate $ followed by numbers (with commas or not) AND a CR/LF I could recurse through the table first replacing the LAST record in the row (any ocurrance of $numberCRLF) with (CR/LF fieldname-2001[content]). THEN do it again (on the same row) for each of the 7 columns changing the "field name" for each successive pass. THEN .. proceed to the next row and continue. This would give me the abilty to deal with each column as content in a field name of it's own. *shrug* :cry:

The challenge for me is coming up with the pattern that would recognize the $0; $000; $0,000; $00,000; $000,000; $0,000,000; type combinations.

I hope this makes (more) sense.
If not, please let me know and I'll provide better examples or files.

Rob

Posted: Sat Oct 04, 2003 12:56 am
by Guest
Hi Rob,

Try this pattern

\$[\d,]+?

This will match dollar sign followed by the maximum number of digits and commas.

Dollar sign and single \$\d
\$\d{2}
\$\d{3} ....

Posted: Sat Oct 04, 2003 4:43 am
by Rob Nevin
To "Guest":

Thanks for the feedback. The following are the results:

The pattern \$[\d,]+? did great job of finding all the combinations of the dollars ($1; $12; $123; $1,234; $12,345; $123,456).

The pattern \$\d{2} (etc) works great as long as I include a "space" character in the pattern. This is fine for all but the last set of numbers as they don't have a trailing space (rather a CR/LF). If I don't include a space it recognizes the first "n" characters in the number set. For example \$\d{3} without a space will recognize the following:

$1; $12; $123; $1,234; $12,345; $123,456

This presents a problem for the last set of numbers in the row as it does NOT have a trailing space. It is followed by a CR/LF.

I changed the pattern to accept either a trailing space or CR LF characters by adding a Sub Pattern as follows:

\$[\d,]+?( |\r\n)

Likewise for the other suggestion provided:

\$\d{3}( |\r\n)

The pattern had to be changed to accomodate the comma for numbers greater than 999.

\$\d,\d{3}( |\r\n)

THANK YOU (so much) for the help. It put me well down the path of the solution.

I'm grateful.