Technology Services

Retrieving Log Files for Remote Hosted Web Sites

[Editor’s note: This post is part 6 of a series of posts discussing Log File Management. For more on this topic, be sure to read Tyler’s other posts.]

The previous blog entry entitled ‘Remote Hosted Sites and ISP Policies‘ within this series [Best Practices for Log File Management] discussed the challenges with respect to ISP policies and how they can impact your ability to get good data. The simple answer is to ensure that the ISP provides you access to your log files and that you warehouse them within part of your IT processes. As log files can provide you with an abundance of information, and we discussed earlier how they can be a source of ‘intellectual property’, a simple script can save you a lot of problems with respect to data.

In order to get your logs on a regular basis, you’ll need 3 things:

  • An FTP address (i.e. ftp.yourWeb site.com)
  • A username for the FTP account
  • A password for the FTP account

Once you’ve got this information, just a few lines in a DOS-based batch file and a scheduler on a server, you can download the log from ‘yesterday’ on a nightly basis. In short form, the following roughly represents what would be included in a simple batch file to get a log file from today’s date minus 1 day.

File named: Log-download.bat

ftp
open ftp.yourWeb site.com
[username]
[password]
lcd c:logfiles
get access-dd-mm-yyyy.gz
end

This simple batch file can be ran automatically by task schedulers or cron jobs on a nightly basis.

Now, depending on how your ISP inventories these log files, matters can become complicated in the sense that the date is often within the file name. In order to dynamically access the log from “today’s date minus 1 day”, additional scripting is required. While I consider this to require slightly more advanced knowledge of scripting and Dos, there is a program called ‘doff’ that I found online which enables you to calculate this variation in DOS; a good IT person can manage this aspect of the process for you.

In order to accomplish this, I find the simplest solution is to create a batch file which outputs a secondary batch file. The primary batch file can be automatically run at 1:00am for example and the secondary file (which is actually the output from the primary batch file) can be run at 1:05am. The secondary batch file has a dynamically modified reference in the ‘get’ command for the name of the log file.

Sample code for the primary batch file might look something like the following in order to generate the batch file as outlined above.

File named: Log-download-script.bat


echo open [ftp.site.ca] > logs.txt
echo [username] >> logs.txt
echo [password] >> logs.txt
echo binary >> logs.txt
echo cd [root folder] >> logs.txt
echo lcd "[destination folder]" >> logs.txt
echo prompt >> logs.txt


@echo off
for /f "tokens=1-3 delims=/ " %%a in ('doff mm/dd/yy') do (
set mm=%%a
set dd=%%b
set yy=%%c)


echo get ex%yy%%mm%%dd%.log >> logs.txt


@echo off
for /f "tokens=1-3 delims=/ " %%d in ('doff mm/dd/yy -1') do (
set aa=%%d
set bb=%%e
set cc=%%f)


echo get ex%cc%%aa%%bb%.log >> logs.txt


echo bye >> logs.txt
echo exit >> logs.txt


ftp.exe -s:c:Scriptslogs.txt

Finally, as there are many components and factors involved in this simple task, I recommend that the nightly process downloads that last 3 days, 5 days, or even 10 days of logs depending on their size. This is a failsafe way to avoid lost data due to internet failure, server failure, or numerous other factors.

In short, it is critical to download and warehouse your logs on a regular basis and it can very easily be automated so it does not become a burden among your list of many things to do each day or week.

[Editor’s note: For more information on log file management, be sure to read Tyler’s ongoing series of blog posts on the topic starting with Best Practices for Log File Management.]

Tyler Gibbs

Share
Published by
Tyler Gibbs

Recent Posts

Optimizing user experiences with Digital Experience Analytics (DXA) platforms

As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…

2 months ago

Enabling Value-Based Bidding with Google Tightlock

Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…

3 months ago

Resolving “Unassigned” Traffic in GA4

Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…

3 months ago

This website uses cookies.