Fixing Kaltura CE5 analytics (Continued)

We previously wrote about fixing some issues with Kaltura CE5 analytics.

However, if you are using Ubuntu server there is another issue which that must be fixed for the analytics to work.

On Ubuntu server, the Apache access log format specified in the Kaltura virtual host file is overridden by the main Apache configuration file. Below we list the steps to identify and fix the issue.

If you haven’t looked at our earlier post yet, make sure you have a look and fix the problems discussed there before trying the following steps.

Screenshot-from-2013-04-03-121008

Identifying the problem

First, we will check if the events (plays, impressions, etc.) are read from the Apache access logs into the database.

  1. Open /opt/kaltura/log/kaltura_apache_access.log
  2. If the log lines are in the following format (note the domain name and port in the first column), then there is a problem with the Apache access log format.
    www.example.com:80 127.0.0.1 - - [14/Jan/2013:07:04:52 +0000] "POST /api_v3/index.php?service=batch&action=getExclusiveAlmostDoneConvertJobs HTTP/1.1" 200 276 "-" "-"
    www.example.com:80 127.0.0.1 - - [14/Jan/2013:07:04:53 +0000] "POST /api_v3/index.php?service=batch&action=getQueueSize HTTP/1.1" 200 274 "-" "-"

Solving the problem

The Kaltura virtual host file defines the log format to be used in the Kaltura Apache access logs. This is the format the Kaltura data warehouse knows to work with. The file is found in /opt/kaltura/app/configurations/apache/my_kaltura.conf.

However, Ubuntu’s apache2 package overrides the Kaltura virtual host settings. The Apache main configuration file is found in /etc/apache2/apache2.conf
The part that causes the problem is found near the end of the file:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

To solve the issue:

  1. Comment the first LogFormat line in the above configuration file.
  2. Delete the /opt/kaltura/log/kaltura_apache_access.log file.
  3. Restart Apache.

Validating

To make sure the issue is solved, perform the following steps:

  1. Enter the KMC and perform some plays.
  2. Look at /opt/kaltura/kaltura_apache_access.log. It should look like this:
  3. Copy /opt/kaltura/log/kaltura/apache_access.log to /opt/kaltura/log/kaltura_apache_access.log-[current date].
  4. Where current date is in the format YYYYMMDD. For example kaltura_apache_access.log-20130321 for March 21, 2013.
  5. Gzip the file you just copied.
  6. Run the hourly analytics script:
    sudo sh /opt/kaltura/dwh/etlsource/execute/etl_hourly.sh
  7. On your Kaltura machine, log into the database and select the kalturadw database. That’s the database that stores most of the analytics. Run the following query:
    SELECT * FROM dwh_fact_events

    You should now see at least one line in the table for the entry you played.

Remember that for the analytics data to appear in the KMC, you have to let the analytics run for a full cycle which takes about 24 hours.