The way I would approach this to start by attempting to fetch the same page using a web browser. If you cannot get that to work, it is probably safe to conclude that the real problem is with the server. You'll need to talk to the server's support staff.
If a browser works, try and repeat the process using the wget
utility. If wget
gives you problems, go back to your browser and find out exactly what headers the browser is sending in the HTTP request and try to get wget
to use the same headers. Once you've got wget
to work, make a note of the headers.
Finally return to your Java code, and modify it so that the HTTP request headers it sends are the same as those that work for wget
.
Yes, I have to authenticate using the proxy of my university and then I am able to access all the data. The proxy authentication is working flawlessly for the 'journal page' and even for other sites, so I'd exclude that the problem is related to that.
I think you may have excluded the real problem. @BalasC is not talking about proxy authentication. Rather he is talking about authentication at the IEEE site. And just because one part of the site appears to work without authentication does not mean it all will. (However, I'd have thought that the site would respond with a "FORBIDDEN" or "AUTHORIZATION REQUIRED" error rather than delivering strange content.)
Another possibility is that the site trying to prevent "screen scraping" of their content using automatic tools. Check the "Terms of Service" for the site to see if what you are trying to do is allowed. (You may choose to ignore the ToS and circumvent the technical measures, but then you might find yourself or your organization IP blocked, or you might be on the end of cease-and-desist letters talking about copyright violation.)