Follow these easy directions to set up Piwik to track the visitors for one of your websites using Apache server’s access log files. This easy to follow tutorial also shows how to import older Apache access log files to ensure you are tracking your previous, current, and future website visitors.
Step 1: Create a New Piwik Website
Create a new Piwik website. If you are already tracking your website with JavaScript tracking, it is important and highly recommended to add a separate Piwik website for log file tracking. Each method (JavaScript tracking and server log files tracking) has its pros and cons (I’ll try to write a different article about that, though), and ideally, you’ll want to use both methods of tracking in order to track your website visitors as comprehensively as possible so you can provide them with the optimum experience.
After logging in to Piwik, click the link for Administration at the top right.
Then click Websites on the left side.
Now click Add a new website.
For the Name, put the name of your website you’d like to track with some kind of indication this is the version you are tracking from the Apache log files (opposed to JavaScript tracking). For example: Website to Track (log files)
For URLs, put the URL of the website: http://websitetotrack.com
Scroll down and click Save.
Step 2: Import Old Apache Server Monthly Access Logs
Simply SSH into your server and type the following to import one log file into your Piwik tracked website:
python /home/username/public_html/piwik/misc/log-analytics/import_logs.py --idsite=1 --url=http://domain.com/piwik logs/domain.com-Jun-2015.gz
You can import different types of files including raw access log files as well as zipped log files.
Step 3: Set up Piwik to Import Apache Server Access Logs Hourly without Duplicates
Step 3a: Create a File to Store Commands
Create a new file called piwiklogimport.sh. Open it and type in the following:
eval $(awk '{ print "count="$1}' /home/username/public_html/piwik/numberoflines.txt) && python /home/username/public_html/piwik/misc/log-analytics/import_logs.py --token-auth=73952ab983b94872q2368q9ndu2ole6e --idsite=2 --skip=$count --url=http://domain.com/piwik /usr/local/apache/domlogs/username/domain.com && wc /usr/local/apache/domlogs/username/domain.com > /home/username/public_html/piwik/numberoflines.txt
Save it and upload it inside your Piwik directory. Change this file’s permissions to 400.
Step 3b: Create a File to Track Imports
Create a new file called numberoflines.txt. Open it and type in 0. Save it and upload it inside your Piwik directory. Change this file’s permissions to 600.
Step 3c: Create a Cron Job for Importing Apache Access Logs
Create a new cron job. Set it to run on the 59 minute mark (for every hour) as such:
59 * * * *
For the cron command, use the following:
/bin/sh /home/username/public_html/piwik/domain.com/piwiklogimport.sh >/dev/null 2>&1
Go ahead and save this cron job.
Now the Apache access logs should be automatically imported at the 59-minute mark of every hour without importing previously imported lines of the log.
That’s all! You should now have a newly tracked Piwik website that tracks your website visitors according to your Apache access server logs including the full history of everyone that has ever previously visited your website–assuming you haven’t deleted any old Apache access log files. If you have deleted old Apache access log files, then you’ll want to recover those old Apache access log files. You can do this by retrieving them from older backup files (you do regularly back up your server, right?) or perhaps doing some kind of file recovery (this is not a surefire method).