Download a Webpage with Java


_

Updated June 13, 2017

If you want to work with URLs in Java you can do it easily with the Java URL class. The class is part of the Java.net package and can be implemented with a simple constructor:

public URL(String url)

start with a URL object that opens a stream

URL u = new URL("http://www.nibb.co");
InputStream webInput = u.openStream();

Now convert the InputStream to a BufferedInputStream

Buffering is not always necessary, however in most cases it gives a boost to performance.

webInput = new BufferedInputStream(webInput);

Next, employ a reader to read in the data.

Reader rd = New InputStreamReader(webInput);
int c ;
while((c =rd.read()) != -1) {
    System.out.print((char) c);
}

The reader is constructed and then the while loop runs and continues to read in data until the end (-1). Notice that the output starts as an integer so it can be compared to -1 immediately. Then we can cast the read input "c" to a char.

Last but not least, close your InputStream.

webInput.close();

That's the basic code you need. However, lets wrap this in a try/catch block to remove any errors your environment might throw at you.

public static void main(String[] args) {
	InputStream webInput = null;
	try {
		URL u = new URL("http://www.nibb.co");
		webInput = u.openStream();
		webInput = new BufferedInputStream(webInput);
		Reader rd = new InputStreamReader(webInput);
		int c;
		while ((c = rd.read()) != -1) {
			System.out.print((char) c);
		}
	} catch (MalformedURLException ex) {
		ex.printStackTrace();
	} catch (IOException ex) {
		ex.printStackTrace();
	}finally {
		if (webInput !=null) {
			try {
				webInput.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

The error you might run into here is the MalformedURLException. I'm going to guess you can figure out what that means. You won't get to far into this program if the url is malformed.

I've also noticed that some URLs must be doing some redirecting or forwarding that is causing the InputStream to not have anything in it. twitter.com is a good example of this.

Hopefully this gives you an idea of how to use the URL class. Thanks,


Topics