The right tool for the right job

The problem? Delete a few repeating patterns from a few text files. Except that each text file is several hundred megabytes in size. For some weird reason, I spent *quite* some time fiddling around with sed (yes - *that* sed )and trying to get multi-line patterns to work.

And then it hit me.

Five minutes and an equal number in lines of Python code later, I had something that did the job as well.

There's a moral in this tale somewhere. I'm just not sure what it is.


Comments:
Just wondering.. Do you use MSYS or Cygwin?
 
Cygwin - though I do have a few native ports lying around my Windows machine somewhere.

I really should try SFU (Microsoft Services For Unix)
 
A file several hundred megabytes in size? How about you'd tell us how long it took to process it :-)
 
Less than a minute. I have nice hardware :-)
 
Do you load the entire file into memory before working on it?
 
Nope- I stream through
 
I take it you don't use regular expressions then.

Just curious, how do you go about looking for the pattern.
 
Since it is line demarcated stuff, I just need to do a .find() to see whether a particular piece of text is there in a line or not
 
Hi sriram, this is guru(HP,chennai!)...even i was working on an text file (report generated by sales team) & its need to be formatted into excel, but fixed demilit import needs manual time spent, so they asked me for a macro, i used .net vb (contains() ) code (but they where clever enough to tell me that only "--" & " " should be deleted...and i was feeling very happy to inform this to my manager that only .net help me to solve quick issues within minutes!
 
Vista is toooo GOOD!..and i am waiting for Retail version!...did you know any idea (time frame) when it will be available in India?
(vista ultimate!) - Guru,chennai


And wish you a (delayed) Deepavali wishes!
 
It's a mess to hear that the video's song was censurated
 
Post a Comment



<< Home

Archives

November 2004   January 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   May 2007   June 2007   July 2007   August 2007   September 2007   October 2007   December 2007   January 2008   February 2008   March 2008   April 2008   May 2008