>>
|
No. 28
File
129921394998.png
- (577.34KB
, 947x547
, Picture 1.png
)
Thumbnail displayed, click image for full size.
I don't know much about Python, but HTML is just plain text. If you know the particular tag the content is housed in, you should be able to request the HTML and use regular expressions to get what you want.
That's assuming you want textual content. Images will either be provided as direct links, or will reference some server side code that serves the image back to you. That code might check your HTTP referrer header to see where you're coming from. If you're coming from off-site, it might serve up an alternate image. So basically, you need to spoof your HTTP Referrer Header. Fortunately, HTTP is also plain text.
I might be making this more complicated than it needs to be, but it seems like you'll need to know HTTP, HTTP Headers, Cookies (just in case the server needs them), HTML, and RegExps. And you'll need an idea of the specific section or sections that house the content you're after. Basically, you're building a primitive web browser from scratch.
I don't know what Python offers in that regard, but it's too popular not to offer something in this area. Look for anything dealing with sockets, http connections, http parsing, and regexps.
|