-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid more duplicate spots #93
Comments
classic cs student. maybe once we have 1 million spots. we could create a dedicated review page that quickly jumps from one possible duplicate spot to another |
I think a dedicated page does not fit my requirements. |
Somehow some users are not aware that reviewing instead of adding a spot is an option. |
I think I already added this before, but it's tough to explain without a lot of text. This should make deduplication unnecessary though: #46 |
You had already clustered the points, correct? I think we can use that clustering to merge points on the front-end. If you have a script that outputs (lat, lon, cluster_id) for every point, that should be easy |
I came from reporting some duplicates (as one might see here https://hitchmap.com/dashboard.html), saw a lot of recent duplicates as well and feared that while cleaning up already new ones will spawn. tough to solve with #46 as e.g. spots on opposite sites of a road can be quite close. I d like to avoid text as well. How about at the end of the process:
could live without 2nd option |
I encountered this too, but it can be solved. I asked ChatGPT for a solution a few days ago, this is what it came up with: import requests
import pandas as pd
from scipy.spatial import KDTree
from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
# Function to query OSM Overpass API to get the nearest road and its geometry
def get_nearest_road_geometry(lat, lon):
# Overpass API query to find the nearest highway and get its geometry
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = f"""
[out:json];
way(around:50,{lat},{lon})["highway"];
(._;>;);
out body geom;
"""
response = requests.get(overpass_url, params={'data': overpass_query})
data = response.json()
# Extract road geometry (as a list of coordinates forming the polyline)
if 'elements' in data and len(data['elements']) > 0:
road_element = data['elements'][0]
if 'geometry' in road_element:
# Return the road ID and the LineString geometry of the road
road_id = road_element['id']
road_geom = LineString([(pt['lon'], pt['lat']) for pt in road_element['geometry']])
return road_id, road_geom
return None, None
# Function to check if two points are on the same side of the road
def are_points_on_same_side(point1, point2, road_geom):
# Calculate nearest points on the road for both points
nearest_p1 = nearest_points(point1, road_geom)[1]
nearest_p2 = nearest_points(point2, road_geom)[1]
# Determine if both points are on the same side of the road
distance1 = point1.distance(nearest_p1)
distance2 = point2.distance(nearest_p2)
# If the signs of the distances are the same, points are on the same side
return (distance1 * distance2) > 0
# Function to query OSM for service areas
def get_service_area(lat, lon):
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = f"""
[out:json];
(node(around:50,{lat},{lon})["amenity"~"parking|fuel|service_area"]["highway"~"service|rest_area"];
way(around:50,{lat},{lon})["amenity"~"parking|fuel|service_area"]["highway"~"service|rest_area"];
relation(around:50,{lat},{lon})["amenity"~"parking|fuel|service_area"]["highway"~"service|rest_area"];
);
out body;
"""
response = requests.get(overpass_url, params={'data': overpass_query})
data = response.json()
# Extract the service area ID (or other identifying information)
if 'elements' in data and len(data['elements']) > 0:
# Return the ID of the first matching service area
return data['elements'][0]['id']
return None
# Sample DataFrame with coordinates
df = pd.DataFrame({
'x': [52.5200, 52.5201, 52.5202], # latitudes
'y': [13.4050, 13.4051, 13.4052] # longitudes
})
# KDTree for efficient neighbor search
coords = df[['x', 'y']].values
tree = KDTree(coords)
# Define distance threshold
distance_threshold = 50 # 50 meters
# Find nearby points
neighbors = tree.query_ball_point(coords, distance_threshold)
# Initialize lists to store road IDs, geometries, and service areas
df['road_id'] = None
df['road_geom'] = None
df['service_area'] = None
# Query road segment and geometry for each point
for idx, row in df.iterrows():
lat, lon = row['x'], row['y']
# Query nearest road
road_id, road_geom = get_nearest_road_geometry(lat, lon)
df.at[idx, 'road_id'] = road_id
df.at[idx, 'road_geom'] = road_geom
# Query service area
service_area_id = get_service_area(lat, lon)
df.at[idx, 'service_area'] = service_area_id
# Check for each pair of nearby points
same_side_or_service_pairs = []
for i, nearby in enumerate(neighbors):
for j in nearby:
if i != j:
road_id_i = df.loc[i, 'road_id']
road_id_j = df.loc[j, 'road_id']
service_area_i = df.loc[i, 'service_area']
service_area_j = df.loc[j, 'service_area']
# Check if they are on the same road and the same side
if road_id_i == road_id_j:
point1 = Point(df.loc[i, 'y'], df.loc[i, 'x']) # (lon, lat)
point2 = Point(df.loc[j, 'y'], df.loc[j, 'x']) # (lon, lat)
road_geom = df.loc[i, 'road_geom']
if road_geom and are_points_on_same_side(point1, point2, road_geom):
same_side_or_service_pairs.append((i, j))
# Check if both points are in the same service area
elif service_area_i and service_area_j and service_area_i == service_area_j:
same_side_or_service_pairs.append((i, j))
print("Pairs of nearby points on the same road side or service area:", same_side_or_service_pairs) Dunno if it works, but something like it probably will work. Even if we make the occasional mistake, as long as clicking the spot shows the data where it was reported, all's good. |
Note: to check if two points are on the same side I'd probably draw a line between the points and see if it intersects with the road, don't know if signed distances really exist |
Yeah signed distances definitely appear to be a hallucination, other than that I think it's very close |
I did similar things around here https://github.com/Hitchwiki/hitchmap-data/tree/main/cleaning In addition we should come up with an idea to educate users to no further pollute the map. |
It's not polluting if we can handle it :) I'm all for people logging exactly where they stood as long as it doesn't mess up the map |
Suggest reviewing an existing spot if a new spot is added within a e.g. 100m of an existing one.
Maybe we need a clever data structure to quickly get all spots close to a new spot.
The text was updated successfully, but these errors were encountered: