Blog Flux MapStats: Stats and Web Counter
]]>
Google
 
Web perl-tips.blogspot.com
 

« Home | Web Server Access-Log File Formats » | Program In Perl or PHP? Perl As A Fast Protoyper, ... » | What's a Web Server? » | Managing Non-Standard Modules » | A Rundown of the Perl Tips To Watch For » | Perl Tips Blog Starts »

Wednesday, September 21, 2005

Web Server Access Log Parsing Part I - Using Perl's Split Function To Extract Specific Fields In A Record

Mastering Regular Expressions, Second EditionRegular Expression Pocket ReferenceIn the last Perl-Tips post, I discussed the NCSA Extended Log Format for web servers. Please read that post before continuing with this one. The discussion here assumes the Extended Log Format for the web server access log.

Let's review the problem at hand. We have a website for which we are getting visitors. We want to do some analysis (web metrics, web analytics) for the website: who is visiting, how often, and which pages? To do this, our secondary goal has to be to transfer the information from the website's web server access log into a database. We are a few posts away from this goal. We have devised a temporary XML format, which we'll use later to transfer the access log data into a database.

To create the WSML (Web Server Markup Language) XML output file, we need to parse the access log. To do that, we need to come up with the appropriate Perl regular expressions to properly extract the fields of each entry in the access.log. Regular expressions are a sort of wildcard/pattern rule that we can specify to extract all or some "fields" in a line of data. I can't give you a full discussion of regular expressions here. (This and this are two of the best books available on the topic.) You'll have to at least under stand the basics of Perl pattern-matching before continuing. (Try checking some of the Perl perldoc documentation that should have come with your Perl distribution first.)

However, before we actually get into true regular expressions, let me show you a way to extract some of the information of the web server access log using the Perl split() function. We cannot accurately extract every field in every record, but we can extract some important ones. The rest of this post is in PDF format [176 Kb]. (Before you read the PDF file, please read the previous posts. There is also the assumption that you know enough Perl to follow along.) The post following this one will get into using regular expressions to extract all of the fields in each access log record.

(c) Copyright 2005-present, Raj Kumar Dash, http://perl-tips.blogspot.com

Technorati : , , , , , ,


E-mail this post



Remenber me (?)



All personal information that you provide here will be governed by the Privacy Policy of Blogger.com. More...

Add a comment

 

About me

  • I'm blogslinger
  • From Canada
  • Writer, author, former magazine editor and publisher, amateur photog, amateur composer, online writer/ blogger, online publisher, freelancer

  • My profile
Powered for Blogger
by Blogger Templates
Computers Blog Top Sites