Archive for May, 2006

Skrape – Update

The first comment revealed a bug. It was easy to correct so I have placed a new zip file with the update. If you downloaded Skrape before May 29th, you should download the update. I also improved the debug mode ( run_skrape.php?debug=1 )

Let me know how this works.

Skrape comments

Aprox 15 people downloaded the plugin so far, but no one has left any comments. What gives? Tell me about your experience with it. Did it work for you? Did it not work? Do you have a different senerio that Skrape could be used in with slight modification? Let me know.

In responce to Mike. I am not sure off the top of my head the differences between the readme file and this page Read Me
The best way I can think to describe what this plugin does, besides what is in the first paragraph of the previously mentioned post…. It will make a remote http request to a site, login (if turned on) vi a form get or post and then grab the html content from a URL and grab all the html content with a regular expression pattern that you supply. Then it inserts that grabbed text/html and inserts it into wordpress as a post in the category you choose. So if you wanted to grab a block of content from somebodie’s website every so often, you could. If you want to take everything on a page then just use the following pattern /.*/smU or if the content you want is alway in a comment you could do something like this /<!– Start content –>(.*)^<!– End content –>/smU

Skrape – additional note about url variables

I forgot to moention the concept of variables in the url. I only created a few date related variables to change the value of the URL each day. So for example if your Skrape url is http://www.foo.bar/%numericyear%/%stringmonthshort%%numericmonth%/%numericday%.html

run_skrape  will replace the values with today’s values. So the skrape url would then become http://www.foo.bar/2006/MAY05/11.html

Let me know if you need another variable to make this plugin work for you.

Cheers!

SKRAPE – A data harvesting WordPress plugin

SKRAPE – A WordPress plugin

What does it do? It uses the PHP CURL extension to login to a remote website to harvest (aka scrape) content and import it into WordPress on a scheduled task.
INITIAL RELEASE AVAILABLE Yeah! the first release. Does it work? Well it works for me. So what I would like to know is if it works for you.

I just posted a zip file of the first release of Skrape here. There is a README file in the zip that describes how to install and configure Skrape in your WordPress installation. It is a very un-obtrusive plugin in the sence that it really is only a configuration page, and a external script that inserts post content through the WordPress function.

– hope to hear your feedback! (INITIAL RELEASE May 6 2006 11:57pm)

Here is the readme:

SKRAPE has been created by Alex Barger
http://www.4devz.com/skrape/

This plugin was specifically designed to run from a scheduled task and login to a remote website,
then gather content from a specific url, then insert it as a WordPress Post. If this is not what
you had in mind, then maybe this is not the right plugin for you.

REQUIREMENTS:
Your server will need to have PHP Curl installed in order for this plugin to work correctly. If your
hosting provider will not give it to you, it may be time to start shopping for a new webhost.

INSTALLATION:
In order to install this plugin into WordPress 2.0 you will need to copy this directory into your WordPress plugins directory
Example: ./wp-content/plugins

ACTIVATE:
Login to your WordPress “Site Admin”, go to “Plugins”, and then click “Activate” next to the Skrape plugin listing.

CONFIGURATION:
This plugin will not do anything until you configure it and create a scheduled task to run the script.
In your “Site Admin” click on “Options” then click on “Configure Skrape”

Fill in all the information about the url you plan to skrape.
if you don’t know what REGEX means, then google “regular expression patterns”

In order to create a scheduled task in a *nix environment, I recommend going into the shell
# crontab -e

# Enter something to the effect of: (i to go to edit mode)
10 5 * * * /usr/bin/wget -q –delete-after http://www.foo.bar/wp-content/plugins/skrape/run_skrape.php

(esc)
:wq

This would set the task to run everyday at 5:10am.

SECURING run_skrape.php

I like to create .htaccess files to prevent access to my script by others. I have incluede a sample .htaccess file.
You would just need to copy sample.htaccess to .htaccess and change the allow from IP addresses to the addess of
the machine running the scheduled task.