IP Database Testing Data

This repository automatically builds both IPv4 and IPv6 information to be used for testing IP address databases. Due to the nature of how this data is collected, it may also be valuable as supplemental data when building a database and can be considered known-good data.

Data sources utilized

The data is built utilizing self-published data by various providers. No 3rd party data is utilized and is considered inherently unreliable for the purposes of this data.

Data source with custom parsers

Pingdom probe server data
- IP address types: IPv4, IPv6
- Data available: Country Code, Country Name, City
Hetrix Monitoring IPs
- IP address types: IPv4
- Data available: Country Code, City, Subdivision Code
- Note: Utilizes a hand-built mapping between Hetrix's hostnames and their locations.
Updown.io Monitoring IPs
- IP address types: IPv4, IPv6
- Data available: Country Code, City, Latitude, Longitude
Oracle Cloud IP Address Ranges
- IP address types: IPv4
- Data available: Country Code, Subdivision Code (incomplete)
- Note: Utilizes a hand-built mapping between Oracle's region IDs and their locations.

Self-published Geofeeds

Linode
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name, Postal Code
DigitalOcean
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name, Postal Code
Vultr
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name, Postal Code
Starlink
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name
Google Cloud
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name
AWS
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name
Ting Fiber
- IP address types: IPv4, IPv6
- Data available: Country Code, Subdivision Code, City Name

Data Processing

Each release will go through a few "processing" steps to ensure the generated data is of good quality.
The order of processing is as follows:

During each parsing step, deduplication is performed. Identical CIDRs are merged if shared properties between the two match, if not the currently existing one will be retained.
The complete list is then sorted in decending order by the quantity of IP addresses in each CIDR
Any CIDRs which are private networks are discarded.
Any CIDRs which haven no data associated with them are discarded.
Any 3-letter country codes are converted to 2 letter country codes.
Next all CIDRs are looped through and compared against previous CIDRs to identify any overlaps / subnets.
- A subnet is retained and any differing data from the parent (supernet) network is considered valid.
- Any overlapping CIDRs are simply discarded with a message as of this moment.
- If a subnet has identical information to it's supernet, it's removed from the dataset.
The final dataset after processing is written to the JSON file before then being uploaded to the release.

Unfortunately, this final step is proving to be quite slow due to it's time complexity which reduces the data size we can easily build. If you have ideas on how to optimize this, please share!

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github		.github
scripts		scripts
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IP Database Testing Data

Data sources utilized

Data source with custom parsers

Self-published Geofeeds

Data Processing

About

Releases 9

Packages

Contributors 3

Languages

License

HostByBelle/ip-db-test-data

Folders and files

Latest commit

History

Repository files navigation

IP Database Testing Data

Data sources utilized

Data source with custom parsers

Self-published Geofeeds

Data Processing

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 3

Languages

Packages