Web Ripper

A ripper is a program that downloads html content to your hard disk. It involves modifying links and image locations to point to locations in your hard disk.

You can write rippers easily with the parser. Here's one way to do it:

Parser parser = new Parser(..);
UrlModifyingVisitor visitor = new UrlModifyingVisitor(parser,"c:\\webpages\\mylocation");
parser.visitAllNodesWith(visitor);
writeToFile(visitor.getModifiedResult()); // you have to define writeToFile in your app program

This visitor simply modifies the links it finds in the page with the prefix you have provided. It then passes back the representation of the page via getModifiedResult().

If you're dealing with frames, you might want to enhance this visitor to be able to modify links on the frame tags. In such a case, override visitTag(), and check if the tag is a FrameTag (Note, UrlModifyingVisitor will register link and image scanners only, so you will need to register the frame scanner seperately). Then, you can proceed to modify the src attribute, use Tag.setAttribute()

--SomikRaha


Last edited on Sunday, February 23, 2003 5:32:39 pm.