Although the Internet made the world flat, geography still matters. Knowing which countries your users live in could provide business opportunities to localize your services and increase profits. The only question is how in the world to do it.

Luckily, user locations can be discovered from IP addresses via geolocation services. Since visitor IPs are stored in web server logs, all that's left to do is run over the logs, geolocate the addresses, and aggregate and store the results. Sounds like a job for Xplenty!

In this post we’ll show how Xplenty’s data integration on the cloud can process web server logs, extract IP addresses, and discover user geolocations. We will use it to calculate web visitors per country, and then drill down to check the number of visitors per city.

The Data

For this demo, we will use 1.5 GB of public domain ‘Star Wars Kid’ web server logs. Example log lines: - - [14/Sep/2003:14:30:14 -0700] "GET / HTTP/1.1" 200 39101 ",1284,58881,00.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - - [14/Sep/2003:14:30:18 -0700] "GET /archive/cat/image/index.shtml HTTP/1.1" 200 18267 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - - [14/Sep/2003:14:30:21 -0700] "GET /archive/cat/events/index.shtml HTTP/1.1" 200 27686 "-" "LinkWalker"

Log format:

  1. Source IP
  2. User Identifier (blank)
  3. UserID (blank)
  4. Date - in the format of dd/MMM/yyyy:HH:mm:ss Z
  5. HTTP request - type, URL, HTTP version
  6. HTTP code
  7. Bytes transferred
  8. Referrer
  9. User agent

Note that IP to location mapping changes over time as IP address ranges are obtained and released, so geolocating this data is done for demo purposes only.

Visitors per Country


The following dataflow can be created using Xplenty’s visual editor. It loads the data, determines unique visitors, converts IPs to countries, counts the number of visitors per country, sorts the data, and then stores the results.

visitors per country Xplenty dataflow

  1. Source - loads the data from the relevant S3 bucket/path. Once all the relevant options are set, the circular arrows button at the top right can auto-detect the schema and fill-in the field names.
    source component

  2. Select - keeps IP addresses and gets rid of the rest of the data.
    select component

  3. Distinct - removes duplicate IPs to get unique visitors (this component doesn’t have any options).

  4. Select - converts IP addresses to countries using the CountryNameFromIP(ip)function.
    select component

  5. Aggregate - counts the number of unique visitors per country. This is achieved by grouping the data by country, and then counting the number of times each country name appears.

  6. Sort - sorts results by the number of visitors in descending order.

  7. Destination - saves the results back to Amazon S3.
    destination component


Here’s an interactive map showing the number of visitors per country (see the next section for full results):

  • Total visitors - 1,387,000
  • Countries - 200
  • Top countries - United States (62%, 863,000), Canada (9%, 126,000), United Kingdom (5%, 75,000).

Visitors per City in the UK


Let’s say that we want to dig deeper into the UK market and find out which cities our website visitors come from. We will use a similar dataflow with several additions - filter visitors from the UK and then geolocate which cities they come from.

Xplenty visitors per city in the UK dataflow

  1. Source - no changes.

  2. Select - no changes.

  3. Distinct - no changes.

  4. Select - keeps IPs for later use and geolocates the countries.
    select component

  5. Filter - filters visitors from the United Kingdom
    filter component

  6. Select - geolocates cities using the CityNameFromIP(ip)function.
    select component

  7. Aggregate - same as before, except that now the ‘city’ field is used.
    aggregate component

  8. Sort - sorts the data by the number of visitors.

  9. Destination - stores results in a different path.


Here are the top UK cities where visitors come from, as well as full results from the previous section:

  1. UK Visitors - 75,000
  2. UK Cities - almost 1,090
  3. Top UK cities - London (4.51%, 3,380), Bristol (0.94%, 700), Birmingham (0.85%, 640)


User locations can be easily extracted from IP addresses with a bit of help from Xplenty. Get your free account and start geolocating your data.