How to do screen scraping where the site requires a log in  
Author Message
Alan Silver





PostPosted: 2006-9-1 0:24:28 Top

dotnet-framework-aspnet, How to do screen scraping where the site requires a log in Hello,

I would like to pull some information off a site that requires a log in.
I have a subscription to a premium content site, and I would like to be
able to do a few automatic requests instead of having to load the site
manually in a browser.

I have seen plenty articles that explain how to do screen scraping in
.NET, others that describe how to do it via a POST, but I couldn't find
any that covered my scenario.

Basically the problem is that the code would first have to call the home
page, then fill in the log in entries and post the page back. Then, the
code would need to hang on to the cookie (which is what I assume they
are using) so that when it does another request (GET would be fine
here), the site will allow the request and not think the requester is
not logged in.

This all works fine in a browser, as the browser handles the cookie for
you, but the code examples I have seen seem to use completely stateless
requests (ie no cookies preserved), so it wouldn't work for a site like
this.

Any ideas? TIA

--
Alan Silver
(anything added below this line is nothing to do with me)
 
alex_f_il





PostPosted: 2006-9-6 1:23:00 Top

dotnet-framework-aspnet >> How to do screen scraping where the site requires a log in You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).

Alan Silver wrote:
> Hello,
>
> I would like to pull some information off a site that requires a log in.
> I have a subscription to a premium content site, and I would like to be
> able to do a few automatic requests instead of having to load the site
> manually in a browser.
>
> I have seen plenty articles that explain how to do screen scraping in
> .NET, others that describe how to do it via a POST, but I couldn't find
> any that covered my scenario.
>
> Basically the problem is that the code would first have to call the home
> page, then fill in the log in entries and post the page back. Then, the
> code would need to hang on to the cookie (which is what I assume they
> are using) so that when it does another request (GET would be fine
> here), the site will allow the request and not think the requester is
> not logged in.
>
> This all works fine in a browser, as the browser handles the cookie for
> you, but the code examples I have seen seem to use completely stateless
> requests (ie no cookies preserved), so it wouldn't work for a site like
> this.
>
> Any ideas? TIA
>
> --
> Alan Silver
> (anything added below this line is nothing to do with me)

 
Alan Silver





PostPosted: 2006-9-6 5:56:00 Top

dotnet-framework-aspnet >> How to do screen scraping where the site requires a log in In article <email***@***.com>,
email***@***.com writes
>You can try SWExplorerAutomation (SWEA) (http:\\webunittesting.com).

Thanks, looks interesting. The only shame is that I prefer to write my
own code rather than use someone else's. You don't get to understand
what's going on when you use a 3rd party app to do the grunt work.

>Alan Silver wrote:
>> Hello,
>>
>> I would like to pull some information off a site that requires a log in.
>> I have a subscription to a premium content site, and I would like to be
>> able to do a few automatic requests instead of having to load the site
>> manually in a browser.
>>
>> I have seen plenty articles that explain how to do screen scraping in
>> .NET, others that describe how to do it via a POST, but I couldn't find
>> any that covered my scenario.
>>
>> Basically the problem is that the code would first have to call the home
>> page, then fill in the log in entries and post the page back. Then, the
>> code would need to hang on to the cookie (which is what I assume they
>> are using) so that when it does another request (GET would be fine
>> here), the site will allow the request and not think the requester is
>> not logged in.
>>
>> This all works fine in a browser, as the browser handles the cookie for
>> you, but the code examples I have seen seem to use completely stateless
>> requests (ie no cookies preserved), so it wouldn't work for a site like
>> this.
>>
>> Any ideas? TIA
>>
>> --
>> Alan Silver
>> (anything added below this line is nothing to do with me)
>

--
Alan Silver
(anything added below this line is nothing to do with me)