Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrutins #115

Merged
merged 57 commits into from
Sep 25, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5b6623c
Ajout script utilitaire de download open data AN
njoyard Jun 28, 2018
86fd1cb
Mise en forme + dependances
njoyard Jun 28, 2018
170d566
Extraction referentiels open data + script presences scrutins
njoyard Jun 28, 2018
b41aeb2
Schema scrutins
njoyard Jun 28, 2018
ef54d09
Maj schéma + parsing scrutins complet
njoyard Jun 28, 2018
f54d5cc
Génération modele doctrine
njoyard Jun 28, 2018
8d211ce
Ajout task load scrutins
njoyard Jun 29, 2018
fd40631
Ajout script update db
njoyard Jun 29, 2018
6bcb38f
Recherche séance + tag interventions résultat scrutins
njoyard Jun 29, 2018
c72738d
Détection délégations trop nombreuses
njoyard Jun 29, 2018
742c237
Typos
njoyard Jun 29, 2018
e9d1558
Ajout check nb scrutins/séance, recherche scrutins par votes et plus …
njoyard Jul 2, 2018
6917b48
Prise en compte date début délégations + typo
njoyard Jul 2, 2018
b1413b0
Ne plus utiliser les <table> pour trouver les scrutins
njoyard Jul 3, 2018
e0b58ae
Accepter les virgules dans les scrutins
njoyard Jul 19, 2018
09b2789
Merge branch 'master' into scrutins
RouxRC Aug 16, 2018
32b3672
merge recent dep fix
RouxRC Aug 16, 2018
d7ef8c3
cleanup, cf comments
RouxRC Aug 16, 2018
fabdb2c
setup presences depending on position + no delegation
RouxRC Aug 16, 2018
6a6528c
handle one preuve présence by source even for same type except for in…
RouxRC Aug 16, 2018
5bb0aa2
lighter log for future crons
RouxRC Aug 16, 2018
93469c6
include load_scrutins in daily loads
RouxRC Aug 16, 2018
01619da
adjust groupes names from opendata an to nd ones
RouxRC Aug 16, 2018
cf463a0
warn on missing scrutins in opendata AN
RouxRC Aug 16, 2018
e3621a1
opendata is updated only daily, no need to run load scrutins trice a day
RouxRC Aug 16, 2018
5a48a93
Adjust first date of delegations present
RouxRC Aug 16, 2018
3a827d6
add checks on missing fields
RouxRC Aug 16, 2018
2e36bec
hardfix missing idJO on some Seances from OpenData AN
RouxRC Aug 16, 2018
cc47b6b
first cleaning of demandeurs
RouxRC Aug 16, 2018
1ad9ef5
hardfix missing demandeurs
RouxRC Aug 16, 2018
8dc025b
fix warnings on missing fields
RouxRC Aug 16, 2018
b3dc009
change demandeur to plural in model + parse groupes
RouxRC Aug 16, 2018
9d8f02e
add loaded dir to gitignore
RouxRC Aug 16, 2018
44f726c
catch potential error
RouxRC Aug 16, 2018
bb48b49
get back from opendata missing groupe of votant when only in mise au …
RouxRC Aug 17, 2018
e1f1605
get position_groupe from groupe for mises au point
RouxRC Aug 17, 2018
91048c8
avoid saving a scrutin for which we can't find the intervention and r…
RouxRC Aug 17, 2018
6c45742
wrong link
RouxRC Aug 17, 2018
6464175
hardfix errors association seance from opendata AN
RouxRC Aug 18, 2018
91f1315
fix more cases of bad seances listed in opendata
RouxRC Aug 19, 2018
f090b3e
match latest table in an intervention
RouxRC Aug 19, 2018
3acfa52
oups wrong interpretation of preg_match_all results architecture
RouxRC Aug 19, 2018
82a2b5b
oups debug
RouxRC Aug 19, 2018
7510acf
no need to search for scrutins in committees
RouxRC Aug 19, 2018
936666b
don't account votes 'NonVotant' as présents
RouxRC Aug 19, 2018
70f1755
update legends of semaines d'activité to signify scrutins are accounted
RouxRC Aug 19, 2018
542308b
Merge branch 'master' into scrutins
RouxRC Aug 19, 2018
7a88333
refacto indicateurs titles and descr for shared use between synthese …
RouxRC Aug 19, 2018
de38e08
Merge branch 'master' into scrutins
RouxRC Sep 16, 2018
a6c6f20
don't integrate scrutins with seances missing
RouxRC Sep 16, 2018
8d8413f
Merge branch 'master' into scrutins
RouxRC Sep 23, 2018
8f4942e
add warnings on position identical to mise_au_point
RouxRC Sep 23, 2018
ebfb47f
adjust period of accounting presences via scrutins depending on deleg…
RouxRC Sep 23, 2018
fe7d5b4
adjust accounting presences for special cases of mise_au_point
RouxRC Sep 23, 2018
af19de5
woups
RouxRC Sep 23, 2018
ba6abfc
set the field for all to avoid php warnings
RouxRC Sep 23, 2018
92a6c96
[WIP] Edit FAQ
RouxRC Sep 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ batch/commission/tmp/
batch/commission/out/
batch/commission/loaded/
batch/commission/presents/
batch/common/opendata
batch/depute/html/
batch/depute/json/
batch/depute/collabs.csv
Expand Down Expand Up @@ -65,6 +66,7 @@ batch/questions/test/
batch/questions/dernier_numero.txt
batch/questions/liste_sans_reponse.txt
batch/sanctions/
batch/scrutin/scrutins

lib/filter/doctrine/
lib/form/doctrine/
Expand Down
5 changes: 4 additions & 1 deletion ansible/roles/cpc.install/templates/web_Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ RUN apt-get update && apt-get -my install --no-install-recommends \
libjpeg62-turbo-dev \
libmagickwand-dev \
libpng{{ use_stretch | ternary('', '12') }}-dev \
libwww-mechanize-perl
libwww-mechanize-perl \
python-bs4 \
python-html5lib \
python-requests

RUN rm -rf /var/lib/apt/lists/*

Expand Down
4 changes: 4 additions & 0 deletions batch/common/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import os

COMMON_DIR = os.path.abspath(os.path.dirname(__file__))
BATCH_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
220 changes: 220 additions & 0 deletions batch/common/opendata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# -*- coding: utf8 -*-
from __future__ import print_function, unicode_literals

import json
import os
import sys
from zipfile import ZipFile

from bs4 import BeautifulSoup
import requests

from . import COMMON_DIR

CACHE_DIR = os.path.join(COMMON_DIR, "opendata")

AN_BASE_URL = "http://data.assemblee-nationale.fr"
AN_ENTRYPOINTS = {
"14": {
"amo": "opendata-archives-xive/deputes-senateurs-et-ministres-xive-legislature",
"reunions": "opendata-archives-xive/agendas-xive-legislature",
"scrutins": "opendata-archives-xive/scrutins-xive-legislature",
},
"15": {
"amo": "acteurs/deputes-en-exercice",
"reunions": "reunions/reunions",
"scrutins": "travaux-parlementaires/votes",
},
}


def log(str):
print(str, file=sys.stderr)


def fetch_an_jsonzip(legislature, objet):
"""
Télécharge le zip du JSON depuis une page de l'open data AN, s'il a été
modifié depuis le dernier téléchargement.

Renvoie le chemin local du fichier zip téléchargé (stocké dans le
répertoire de cache) et un flag indiquant s'il a été modifié
"""

if (
str(legislature) not in AN_ENTRYPOINTS
or objet not in AN_ENTRYPOINTS[str(legislature)]
):
raise Exception(
"Objet inconnu: %s (%s legislature)" % (objet, legislature)
)

if not os.path.exists(CACHE_DIR):
os.makedirs(CACHE_DIR)

localzip = os.path.join(CACHE_DIR, "%s_%s.zip" % (legislature, objet))
localzip_lastmod = "%s.last_modified" % localzip

url = "%s/%s" % (AN_BASE_URL, AN_ENTRYPOINTS[str(legislature)][objet])
log("Téléchargement %s" % url)

try:
soup = BeautifulSoup(requests.get(url).content, "html5lib")
except Exception:
raise Exception("Téléchargement %s impossible" % url)

def match_link(a):
return a["href"].endswith(".json.zip") or a["href"].endswith(
".json.zip "
)

try:
link = [a for a in soup.select("a[href]") if match_link(a)][0]
except Exception:
raise Exception("Lien vers dump .json.zip introuvable")

jsonzip_url = link["href"].replace(".json.zip ", ".json.zip")
if jsonzip_url.startswith("/"):
jsonzip_url = "%s%s" % (AN_BASE_URL, jsonzip_url)

log("URL JSON zippé : %s" % jsonzip_url)

try:
lastmod = requests.head(jsonzip_url).headers["Last-Modified"]
except Exception:
raise Exception("Date du dump .json.zip introuvable")

log("Date modification dump .json.zip: %s" % lastmod)
do_download = True

if os.path.exists(localzip) and os.path.exists(localzip_lastmod):
with open(localzip_lastmod, "r") as f:
known_lastmod = f.read()

log("Date modification dernier telechargement: %s" % known_lastmod)
if known_lastmod == lastmod:
do_download = False

if do_download:
log("Téléchargement .json.zip")

try:
with open(localzip, "wb") as out:
r = requests.get(jsonzip_url, stream=True)
for block in r.iter_content(1024):
out.write(block)
with open(localzip_lastmod, "w") as f:
f.write(lastmod)
except Exception:
raise Exception("Téléchargement .json.zip impossible")
else:
log("Téléchargement skippé, fichier non mis à jour")

return localzip, do_download


def fetch_an_json(legislature, objet):
"""
Télécharge le zip du JSON depuis une page de l'open data AN, s'il a été
modifié depuis le dernier téléchargement.

page: URL relative de la page, par exemple "travaux-parlementaires/votes"

Renvoie les données JSON du fichier zip téléchargé et un flag indiquant si
le fichier a été modifié.
"""

localzip, updated = fetch_an_jsonzip(legislature, objet)
with ZipFile(localzip, "r") as z:
for f in [f for f in z.namelist() if f.endswith(".json")]:
log("JSON extrait : %s" % f)
with z.open(f) as zf:
return json.load(zf), updated


def _cached_ref(
legislature, objet, id_mapping, extract_list, extract_id, extract_mapped
):
"""
Génère et renvoie un cache de mapping d'identifiants à partir d'un dump
open data json.

legislature, objet: définit le dump à utiliser
id_mapping: identifiant unique du mapping, utilisé pour stocker en cache
extract_list: fonction qui extrait la liste des items du dump json
extract_id: fonction qui extrait l'identifiant à mapper d'un item
extract_mapped: fonction qui extrait les données mappées d'un item
"""

data, updated = fetch_an_json(legislature, objet)
cached_file = os.path.join(
CACHE_DIR, "mapping_%s_%s.json" % (legislature, id_mapping)
)

if updated or not os.path.exists(cached_file):
cache = {}
for item in extract_list(data):
id = extract_id(item)
cache[id] = extract_mapped(item)

with open(cached_file, "w") as f:
json.dump(cache, f)
return cache
else:
with open(cached_file) as f:
return json.load(f)


def ref_groupes(legislature):
"""
Renvoie un mapping des id opendata des groupes parlementaires vers leur
abbréviation
"""

def _extract_list(data):
return filter(
lambda o: o["codeType"] == "GP",
data["export"]["organes"]["organe"],
)

def _extract_id(organe):
return organe["uid"]

def _extract_mapped(organe):
return organe["libelleAbrev"]

return _cached_ref(
legislature,
"amo",
"groupes",
_extract_list,
_extract_id,
_extract_mapped,
)


def ref_seances(legislature):
"""
Renvoie un mapping des id opendata des séances vers leur ID
"""

def _extract_list(data):
return filter(
lambda reunion: "IDS" in reunion["uid"],
data["reunions"]["reunion"],
)

def _extract_id(reunion):
return reunion["uid"]

def _extract_mapped(reunion):
return reunion["identifiants"]["idJO"]

return _cached_ref(
legislature,
"reunions",
"seances",
_extract_list,
_extract_id,
_extract_mapped,
)
Loading