Although the Internet made the world flat, geography still matters. Knowing which countries your users live in could provide business opportunities to localize your services and increase profits. The only question is how in the world to do it.
Luckily, user locations can be discovered from IP addresses via geolocation services. Since visitor IPs are stored in web server logs, all that's left to do is run over the logs, geolocate the addresses, and aggregate and store the results. Sounds like a job for Xplenty!
In this post we’ll show how Xplenty’s data integration on the cloud can process web server logs, extract IP addresses, and discover user geolocations. We will use it to calculate web visitors per country, and then drill down to check the number of visitors per city.
For this demo, we will use 1.5 GB of public domain ‘Star Wars Kid’ web server logs. Example log lines:
188.8.131.52 - - [14/Sep/2003:14:30:14 -0700] "GET / HTTP/1.1" 200 39101 "http://www.wired.com/news/culture/0,1284,58881,00.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 184.108.40.206 - - [14/Sep/2003:14:30:18 -0700] "GET /archive/cat/image/index.shtml HTTP/1.1" 200 18267 "http://www.waxy.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 220.127.116.11 - - [14/Sep/2003:14:30:21 -0700] "GET /archive/cat/events/index.shtml HTTP/1.1" 200 27686 "-" "LinkWalker"
- Source IP
- User Identifier (blank)
- UserID (blank)
- Date - in the format of dd/MMM/yyyy:HH:mm:ss Z
- HTTP request - type, URL, HTTP version
- HTTP code
- Bytes transferred
- User agent
Note that IP to location mapping changes over time as IP address ranges are obtained and released, so geolocating this data is done for demo purposes only.
Visitors per Country
The following dataflow can be created using Xplenty’s visual editor. It loads the data, determines unique visitors, converts IPs to countries, counts the number of visitors per country, sorts the data, and then stores the results.
Distinct - removes duplicate IPs to get unique visitors (this component doesn’t have any options).
- Total visitors - 1,387,000
- Countries - 200
- Top countries - United States (62%, 863,000), Canada (9%, 126,000), United Kingdom (5%, 75,000).
Visitors per City in the UK
Let’s say that we want to dig deeper into the UK market and find out which cities our website visitors come from. We will use a similar dataflow with several additions - filter visitors from the UK and then geolocate which cities they come from.
Source - no changes.
Select - no changes.
Distinct - no changes.
Sort - sorts the data by the number of visitors.
Destination - stores results in a different path.
Here are the top UK cities where visitors come from, as well as full results from the previous section:<iframe src="https://docs.google.com/spreadsheets/d/1jvFnYsWb6Qfl_NHg0JCUGU1e9JVIAdlJyNgGPlAby1I/pubhtml?widget=true&headers=false"></iframe>
- UK Visitors - 75,000
- UK Cities - almost 1,090
- Top UK cities - London (4.51%, 3,380), Bristol (0.94%, 700), Birmingham (0.85%, 640)
User locations can be easily extracted from IP addresses with a bit of help from Xplenty. Get your free account and start geolocating your data.