You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be memory leak issue in libpostal_parse_address. The memory usage will increase over time when parsing the same address.
My country is
This issue is not specific to any country or address. I tried using other addresses or random strings, but the issue still remains.
Here's how I'm using libpostal
The program parses the example address 10M times and use Linux pmap to print its memory usage.
// gcc -o app app.c $(pkg-config --cflags --libs libpostal)
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<libpostal/libpostal.h>intmain(int argc, char **argv) {
if (!libpostal_setup() || !libpostal_setup_parser()) {
exit(EXIT_FAILURE);
}
libpostal_address_parser_options_t options = libpostal_get_address_parser_default_options();
int count = 10000000;
int batch = 100000;
for (int i = 0; i < count; i++) {
libpostal_address_parser_response_t *parsed = libpostal_parse_address("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA", options);
libpostal_address_parser_response_destroy(parsed);
if (i % batch == 0)
{
char command[256];
sprintf(command, "pmap -x %d > %d.txt", getpid(), i / batch + 1);
puts(command);
system(command);
}
}
libpostal_teardown();
libpostal_teardown_parser();
}
Here's what I did
See above.
Here's what I got
The memory usage increases over time.
echo"File Kbytes RSS Dirty";foriin {5..100..5};doecho -n "$i.txt: "&& cat $i.txt | grep total;done
File Kbytes RSS Dirty
5.txt: total kB 1942360 1924872 1921816
10.txt: total kB 2007900 1960788 1957732
15.txt: total kB 2007900 1980316 1977260
20.txt: total kB 2073436 1999848 1996792
25.txt: total kB 2073436 2019380 2016324
30.txt: total kB 2073436 2038912 2035856
35.txt: total kB 2204508 2058444 2055388
40.txt: total kB 2204508 2077972 2074916
45.txt: total kB 2204508 2097504 2094448
50.txt: total kB 2204508 2117036 2113980
55.txt: total kB 2204508 2136568 2133512
60.txt: total kB 2204508 2156100 2153044
65.txt: total kB 2204508 2175632 2172576
70.txt: total kB 2466652 2195160 2192104
75.txt: total kB 2466652 2214692 2211636
80.txt: total kB 2466652 2234224 2231168
85.txt: total kB 2466652 2253756 2250700
90.txt: total kB 2466652 2273288 2270232
95.txt: total kB 2466652 2292816 2289760
100.txt: total kB 2466652 2312348 2309292
I also use valgrind to run 1M times but it does not report memory leak.
valgrind ./app2
==3615986== Memcheck, a memory error detector
==3615986== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3615986== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==3615986== Command: ./app2
==3615986==
==3615986== Warning: set address range perms: large range [0x2f85a040, 0x3fa385f0) (undefined)
==3615986== Warning: set address range perms: large range [0x3fa39040, 0x4fc175f0) (undefined)
==3615986== Warning: set address range perms: large range [0x3fa391ca, 0x4fc171ca) (defined)
==3615986== Warning: set address range perms: large range [0x3fa39028, 0x4fc17608) (noaccess)
==3615986== Warning: set address range perms: large range [0x6577c040, 0x82a05c8c) (undefined)
==3615986== Warning: set address range perms: large range [0x2f85a028, 0x3fa38608) (noaccess)
==3615986== Warning: set address range perms: large range [0x6577c028, 0x82a05ca4) (noaccess)
==3615986==
==3615986== HEAP SUMMARY:
==3615986== in use at exit: 0 bytes in 0 blocks
==3615986== total heap usage: 71,539,052 allocs, 71,539,052 frees, 7,820,286,857 bytes allocated
==3615986==
==3615986== All heap blocks were freed -- no leaks are possible
==3615986==
==3615986== For lists of detected and suppressed errors, rerun with: -s
==3615986== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Here's what I was expecting
The memory usage should not increase overtime.
For parsing issues, please answer "yes" or "no" to all that apply.
That's likely memory fragmentation (each of the ten million parses is doing some mallocs and frees - most of our stuff internally uses dynamic arrays and even for arrays of strings we keep a data structure internally which packs a bunch of C strings into one contiguous array, but there might be a lot of small strings created by strdup for the purposes of the API call since we wanted to keep a simple API for the higher level bindings that didn't require knowing any fancier data structures). If valgrind doesn't report a leak, it's unlikely there truly is one (false positives are more common than false negatives), especially not something that would leak 50+ bytes per call.
Might check something like mtrace or malloc_stats, and/or could probably duplicate the pattern by simply doing some strtok/strdup/free combo 10 million times on the input string without using libpostal at all. Reducing the memory growth is maybe possible with some kind of memory pooling or using small-string caching a la what Python does, but the API contract is you get some char *s that you have to free, so at some point that's roping in system malloc. If you can confirm there's something leaking memory in libpostal itself happy to fix, but not sure there's much to be done there otherwise.
Hi!
There seems to be memory leak issue in
libpostal_parse_address
. The memory usage will increase over time when parsing the same address.My country is
This issue is not specific to any country or address. I tried using other addresses or random strings, but the issue still remains.
Here's how I'm using libpostal
The program parses the example address 10M times and use Linux
pmap
to print its memory usage.Here's what I did
See above.
Here's what I got
The memory usage increases over time.
I also use valgrind to run 1M times but it does not report memory leak.
Here's what I was expecting
The memory usage should not increase overtime.
For parsing issues, please answer "yes" or "no" to all that apply.
This is not parsing issues.
Here's what I think could be improved
See above.
More information:
8f2066b1d30f4290adf59cacc429980f139b8545
The text was updated successfully, but these errors were encountered: