Skip to content

Commit

Permalink
Use the Tor DB as GeoIP database
Browse files Browse the repository at this point in the history
a) The DB file size is bigger. Current IpToCountry.dat is 1.2 MiB, Tor
DB is 4 MiB, optimized Tor DB is 2 MiB. "Optimized" means I used 'sed'
to remove an unnecessary column in the DB. If you really want to go
for size, you can zip it to ~700 KiB, this increases runtime a bit,
but it's still ~15x faster than the old method.

b) Tor uses "??" instead of "ZZ" for unknown codes, and still uses
"CS" which stands for Serbia and Montenegro - the country stopped
existing in 2006. Maybe someone could ask them why their DB uses
"CS"... This can be solved easily by just replacing them with 'sed'.

c) The new DB is sorted in ascending order, which means that the
function to do the binary search has to be changed (right now I simply
reverse the array), which saves another ~6 ms. I don't know how to do
this.

human-readable text file in a zip.

Advantages:
- It's human readable
- It's easy to update because we can use Tor geoip
- It's a lot faster than the base85 approach
- It has a smaller file size

==== New zip file ====

The code no longer uses the IpToCountry.dat file and instead uses a zip file called IpToCountry.zip. This zip file is expected to contain exactly one text file in the Tor geoip format according to the spec below.
The zip file should be compressed to save space (~2 MiB uncompressed -> 0.7 MiB compressed).

==== New IpToCountry.txt file ====

Format for each line: <fromIP>,<ISO 3166-1 alpha-2 country code>
Example: 16781312,JP
This is like to old format, but not base85 encoded.

Empty lines are allowed.
Comments may start with any symbol other than a number.

----------------------------------

Get the raw .txt file here: https://github.com/torproject/tor/raw/main/src/config/geoip

The file has to be processed with the following three 'sed' commands:

sed -E -i 's/([0-9]*),[0-9]*,([A-Z]*)/\1,\2/g' IpToCountry.txt && sed -E -i 's/,\?\?/,ZZ/g' IpToCountry.txt && sed -E -i 's/,CS/,RS/g' IpToCountry.txt

1) Remove last column, because Tor geoip format is: fromIP,toIP,countryCode. Freenet does not need to toIP value, the binary search algorithm will take care of this.
2) Replace '??' with 'ZZ' for unknown countries, because '??' is not in the ISO 3166 standard.
3) Replace 'CS' with 'RS' because the country 'CS' is not in the ISO 3166 standard.

Zip this text file into IpToCountry.zip and place it in the main Freenet folder.

==== Code changes ====

The base85 code is left in the source as well as the file reader for the old format.

- src/freenet/clients/http/geoip/IPConverter.java
-- zip reader to save space.
-- ArrayList is allocated with 180000 slots to have it not resize that many times (does not matter for speed though anyway).
-- Ignore empty lines and lines that start with anything but a number (comments).
-- Cast (int) to the Long value, exactly like the old code did.
-- Get country, identical to old code.
-- Reverse the List, because the binary search expects the list to be in descending order. Takes <10 ms.
-- Convert the List<Integer/Short> to int[]/short[] to save lots of memory. See below for explanation. Takes <10 ms.
-- Catch all possible errors.

-- I did not feel confident in messing with the binary search because I might overlook some edge case where indexes would no longer match, so I left it alone. Reversing both arrays takes less than 10 ms combined.

- src/freenet/node/NodeFile.java b/src/freenet/node/NodeFile.java
-- Changed default location from 'IpToCountry.dat' to 'IpToCountry.zip'.

Memory from heap dump according to VisualVM:
List<Integer> vs int[]: 3.3 MiB vs 660 KiB
List<Short> vs short[]: 2.0 MiB vs 330 KiB

==== Further changes (aka 'more stuff to do for Arne' :) ) ====

https://github.com/freenet/scripts#releasing-stable-freenet-builds
The FAQ link has to be removed as the old IP DB site is no longer used.

/scripts/setup-release-environment
Has to be adjusted. How did it work in the past few years Arne, because the website has been offline for a while?

The new zip file has to be added to the insert/release script.
  • Loading branch information
naejadu authored and ArneBab committed Sep 9, 2023
1 parent 5bda2ac commit 89fdffe
Showing 2 changed files with 10 additions and 3 deletions.
11 changes: 9 additions & 2 deletions src/freenet/clients/http/geoip/IPConverter.java
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
package freenet.clients.http.geoip;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.ref.SoftReference;
import java.lang.ref.WeakReference;
import java.nio.file.NoSuchFileException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.zip.ZipException;
import java.util.zip.ZipFile;

import freenet.clients.http.StaticToadlet;
import freenet.node.Node;
2 changes: 1 addition & 1 deletion src/freenet/node/NodeFile.java
Original file line number Diff line number Diff line change
@@ -9,7 +9,7 @@ public enum NodeFile {
Seednodes(InstallDirectory.Node, "seednodes.fref"),
InstallerWindows(InstallDirectory.Run, "freenet-latest-installer-windows.exe"),
InstallerNonWindows(InstallDirectory.Run, "freenet-latest-installer-nonwindows.jar"),
IPv4ToCountry(InstallDirectory.Run, "IpToCountry.dat");
IPv4ToCountry(InstallDirectory.Run, "IpToCountry.zip");

private final InstallDirectory dir;
private final String filename;

0 comments on commit 89fdffe

Please sign in to comment.