Watching The Web
The problem, of course, appeared when I actually started work on her request. I had a vague idea how this might work: all I had to do, I reasoned, was write a little script that woke up each morning, scanned her list of URLs, downloaded the contents of each, compared those contents with the versions downloaded previously, and sent out an email alert if there was a change.
Seemed simple - but how hard would it be to implement? I didn't really like the thought of downloading and saving different versions of each page on a daily basis, or of creating a comparison algorithm to test Web pages against each other.
I thought there ought to be an easier way. Maybe the Web server had a way of telling me if a Web page had been modified recently - and all I had to do was read that data and use it in a script. Accordingly, my first step was to hit the W3C Web site, download a copy of the HTTP protocol specification, from ftp://ftp.isi.edu/in-notes/rfc2616.txt, and print it out for a little bedside reading. Here's what I found, halfway through:
The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified.
There we go, I thought - the guys who came up with the protocol obviously anticipated this requirement and built it into the protocol headers. Now to see if it worked...
The next day at work, I fired up my trusty telnet client and tried to connect to our intranet Web server and request a page. Here's the session dump:
$ telnet darkstar 80
Connected to darkstar.melonfire.com.
Escape character is '^]'.
HEAD / HTTP/1.0
HTTP/1.1 200 OK
Date: Fri, 18 Oct 2002 08:47:57 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.2
Last-Modified: Wed, 09 Oct 2002 11:27:23 GMT
Connection closed by foreign host.
As you can see, the Web server returned a "Last-Modified" header indicating the date of last change of the requested file. So far so good.
|© Melonfire, 2005. all rights reserved.|