• Welcome to Theos PowerBasic Museum 2017.

News:

Attachments are only available to registered users.
Please register using your full, real name.

Main Menu

get plain text from webpage

Started by Edwin Knoppert, September 14, 2009, 02:41:56 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Edwin Knoppert

I need to save the html as text from my webcontrol.
But it needs to save to a predetermined file and no user interaction.
Do i need the document stuff?
I don't mind if it saves it to a tempfile first but streaming may be nice.

Edwin Knoppert

The simpliest approach would be:
Object Get pWB.Document.body.innertext To v
but this may skip the head section?

José Roca

 
You can call the Document property and then query for the IPersistFile interface.

The ExecWB method, with the OLECMDID_SAVEAS flag, can be used, but, for secutiry reasons, it shows a save dialog even if you use the OLECMDEXECOPT_DONTPROMPTUSER flag, so you will need to install a hook.

See: http://www.codeproject.com/KB/shell/iesaveas.aspx

There have been also suggestions to use UrlDownloadToFile or INet for the purpose:

http://support.microsoft.com/kb/q244757/

Edwin Knoppert

These are all rather poor solutions haha :)
For the moment i stick with my solution.
If that fails i'll try to parse the stream with regular expression, i ever seen one on the PB forum.
Maybe that helps.
The IPersist is a good (and simple to do) tip.

Thanks,

José Roca

 
If they are poor it is because the good ones are forbidden. Otherwise, a malicious web page could use an script to save anything it wants in your computer without you ever noticing.

If the IPersistFile way is still available it is only because scripts can't use it.