Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Integrate IPinfo's IP-to-Country ASN database. #765

Open
abdullahdevrel opened this issue Aug 31, 2024 · 4 comments
Open

Comments

@abdullahdevrel
Copy link

Hi,

I work for IPinfo, but I have been using Goatcounter for my personal projects for several years and have been exploring self-hosting it recently.

I would like to request the integration of the IPinfo IP to Country or IP to Country ASN/ISP database for Goatcounter. I believe that from a development philosophy, IPinfo’s free IP database is perfect for Goatcounter. Additionally, there are technical benefits as well.

Goatcounter specific benefits

Binary distribution issues and "MaxMind®️'s EULA"

Even though I have not made progress in selfhosting it, but I believe the binary file includes MaxMind’s country database which actually creates a tricky situation. As far I know they do not allow redistribution of their database even the free database. They have an EULA that requires users to download their own database using their access tokens

The value proposition of IPinfo's database is that it is simply CC-BY-SA 4.0. You can do whatever you want with it as long as you give attribution. Commercial usage is allowed as well. Librespeed is using our data by packaging it directly in the repo: librespeed/speedtest#641 (comment)

ASN/ISP data

You have mentioned that city-level data is too granular, so maybe you can add the ASN/ISP data from the IP to Country ASN database as an additional data source. The ASN/ISP detection is based on network routing data.

Our country-level data, even though free, is a zero-compromise, fully accurate database. We support daily updates and offer range clustering. It is just a pure subset of our IP geolocation database, without the more granular location information and only provides country level data.

General Technical benefits

The database has the following features:

  • It includes country and ASN information in the same database.
  • It is updated daily, with zero compromise to accuracy. There is no range clustering, and the database provides full accuracy.
  • The data granularity reaches individual IP level.
  • The database comes in MMDB database format.
  • It is licensed under CC-BY-SA 4.0, permitting commercial usage.
  • Available file formats include: CSV, MMDB, JSON
  • The data is tabular and unnested, making it very easy to use. The dataset includes both IPv4 and IPv6 in a single file.

Database schema

Field Name Example Data Type Description
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

My apologies for the wall of text. Let me know what you think. Thank you!

@arp242
Copy link
Owner

arp242 commented Aug 31, 2024

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

Your databases seem way larger; "IP to Country Database" is ~38M. That's far to large to include in the GoatCounter binary. The "Geolite countries" is ~3.7M. I don't know why it's so much larger? People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

@abdullahdevrel
Copy link
Author

Thank you for reviewing the request.

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

The challenge is that they explicitly have a commercial distribution license for these free databases, so I am not sure what the consequences of this are, to be honest. I am not sure if those Linux distros have their own licensing terms with them that permit the distribution like that.

Your databases seem way larger; "IP to Country Database" is ~38M.

That is because our database provides full accuracy. The accuracy extends down to the individual IP level, even for a country database. When you download an IP database, compromise happens in two ways: with infrequent updates and range clustering. However, because we are providing full accuracy, the resulting database is larger.

Another idea is that since you can download the database directly via a URI, users can download it during installation. This will eliminate the need to package it with a database in the first place within the binary. Also, this download mechanism can support database updates as well.

People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

On a cursory view, it seems like the lookup mechanism is not database agnostic, but I could be wrong. There are structural differences between our database and MaxMind's (https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo/). Mainly:

  • We have the location built in, while they provide the geoname_id and a complementary geoname database
  • Our database structure is flat/tabular, while they opt for a nested database structure.

Let me know what you think.

@arp242
Copy link
Owner

arp242 commented Aug 31, 2024

I want GoatCounter to be a "Just Works" binary without external dependencies, so people can easily self-host with a minimum of fuss. Dealing with GeoIP database downloads rather goes against that.

I don't mind providing compatibility with it, but I don't think it will be the default if it's so much larger.


However, if I try to use it, it errors out with:

maxminddb: cannot unmarshal EU into type struct { Names map[string]string "maxminddb:\"names\""; Code string "maxminddb:\"code\""; GeoNameID uint "maxminddb:\"geoname_id\"" }

So I guess the database structure is different.

I don't want to "migrate to" anything, I want to be compatible with both. I don't understand why you don't just provide a "Maxmind-compatible database" as an option.

Going from country = maxmind_data['country']['iso_code'] to country = ipinfo_data['country'] is a silly change and it doesn't really matter all that much which one is used. Maybe one is marginally better, but not at least providing a compatible database is rather lacking in pragmatism.

@abdullahdevrel
Copy link
Author

Thank you for reviewing. I understand that MaxMind's database is deeply integrated into the project and would require some engineering investment to adopt. We tried our best to provide the simplest and best data to use out there. Because of the ease of use and the quality of the data, it usually justifies making the engineering investment to adopt.

Due to the unpredictable nature of MaxMind's database structure, you have to wrap every call to get a value in switch/case statements. In our case, if we do not have the data, we simply return an empty string. Making a drop-in MaxMind integration compatible database would essentially be a compromise, in my personal opinion, as you have to create a nested version of the database, which will increase its size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants