Dynamic calibration framework for low-cost PM2.5 sensors
DCF
PM25
Calibration
Open
Latest Update: 2020.11.04
Dynamic Calibration Model Status Report: https://pm25.lass-net.org/DCF/
AirBox Status Report: https://pm25.lass-net.org/AirBox/
PM2.5 Open Data Portal: https://pm25.lass-net.org/
Model Info | Requires | Usage | Resource | Troubleshooting
-
Data Length:
7, 14, 21, 31 days
-
Feature:
PHTR, PHT, PHR, PTR, PH, PT, PR, P
PM2.5 Hour at which data is sensed Temperature Relative humidity abbreviation P H T R type float int float float example 11.3 1 32.5 72.5 -
Method:
RR, LassoR, LinearR, BR, RFR, SVR, GAM
abbreviation full name package RR Ridge Regression sklearn.linear_model.Ridge
LassoR Lasso Regression sklearn.linear_model.Lasso
LinearR LinearRegression sklearn.linear_model.LinearRegression
BR BayesianRidge sklearn.linear_model.BayesianRidge
RFR Random Forest Regression sklearn.ensemble.RandomForestRegressor
SVR Support Vector Regression sklearn.svm.SVR
GAM Generalized Additive Model pygam
DCF training dependencies
Python3.6
Package | Version | Link |
---|---|---|
joblib | 0.15.1 | https://joblib.readthedocs.io |
pygam | 0.8.0 | https://pygam.readthedocs.io/en/latest/ |
scikit-learn | 0.23.1 | https://scikit-learn.org/stable/ |
Python3
joblib
pygam
scikit-learn
numpy
pandas
math
pandas
import pandas as pd
import math
DegreeToRadians = lambda degree: degree * math.pi / 180
def distanceWithCoordinates(lat1, lon1, lat2, lon2):
RADIUS = 6371
dlat, dlon = DegreeToRadians(lat2-lat1), DegreeToRadians(lon2-lon1)
lat1, lat2 = DegreeToRadians(lat1), DegreeToRadians(lat2)
_ = (math.sin(dlat/2) ** 2) + (math.sin(dlon/2) ** 2) * math.cos(lat1) * math.cos(lat2)
return RADIUS * 2 * math.atan2(math.sqrt(_), math.sqrt(1-_))
def find_site( device_lon, device_lat ):
daily_status_url = "https://raw.githubusercontent.com/IISNRL/DCF-PM2.5/master/2020/20200620/20200620-PMS5003.json"
models_info = pd.read_json(daily_status_url)
CountingDistance = lambda row: distanceWithCoordinates( row['Latitude'], row['Longitude'], device_lat, device_lon )
models_info[ 'distance' ] = models_info.apply( CountingDistance, axis=1 )
models_sort = models_info.dropna( subset=['distance'] ).sort_values( by="distance" ).reset_index()
return models_sort['site'][0], models_sort['distance'][0]
site, distance = find_site( 120.69, 23.99 )
json
requests
urllib
import json
import requests
import urllib
## config
config_url = "https://raw.githubusercontent.com/IISNRL/DCF-PM2.5/master/2020/20200620/20200620-PMS5003-nantou.json"
r = requests.get(config_url)
content = r.content
config_dict = json.loads(content)
## joblib
model_url = "https://github.com/IISNRL/DCF-PM2.5/raw/master/2020/20200620/20200620-PMS5003-nantou.joblib"
urllib.request.urlretrieve(model_url, '20200620-PMS5003-nantou.joblib')
Import requirement packages
import pandas as pd
import numpy as np
import sklearn
import pygam
import joblib
Data preprocessing
# raw_value (Dict)
{'P': 11, 'H': 1, 'T': 32.5, 'R': 72.5}
# raw_DF (DataFrame)
raw_DF = pd.DataFrame([{'P': 11, 'H': 1, 'T': 32.5, 'R': 72.5},{'P': 12, 'H': 2, 'T': 33.1, 'R': 75},{'P': 15.6, 'H': 3, 'T': 32.1, 'R': 70.3}])
# P H T R
# 0 11.0 1.0 32.5 72.5
# 1 12.4 2.0 33.1 75.0
# 2 15.6 3.0 32.1 70.3
## single datapoint
# select fields
X_test = [raw_value[field] for field in config_dict["Feature"]]
# reshape
X_test = np.array(X_test).reshape(1, -1)
## series datapoints
# select fields
X_test = raw_DF[ list(config_dict["Feature"]) ]
Load calibration model and predict value
## load
lm = joblib.load( "20200620-PMS5003-nantou.joblib" )
## calibration
# "X_test" columns order should be the same as "Feature" in config
Y_pred = lm.predict( X_test )
## result
# array([8.77330524])
# array([ 8.77330524, 8.9214365 , 10.91355547])
Calibration Models Status Report
- status:
https://pm25.lass-net.org/DCF/latest.json - model file:
latest-PMS5003-<sitename>.joblib
https://pm25.lass-net.org/DCF/model/latest-PMS5003-nantou.joblib - model config:
latest-PMS5003-<sitename>.json
https://pm25.lass-net.org/DCF/model/latest-PMS5003-nantou.json
- daily forlder:
<YYYY>/<YYYYMMDD>
https://github.com/IISNRL/DCF-PM2.5/tree/master/2020/20200620 - daily status:
<YYYYMMDD>-PMS5003.json
https://github.com/IISNRL/DCF-PM2.5/blob/master/2020/20200620/20200620-PMS5003.json - model file:
<YYYYMMDD>-PMS5003-<sitename>.joblib
https://github.com/IISNRL/DCF-PM2.5/blob/master/2020/20200620/20200620-PMS5003-nantou.joblib - model config:
<YYYYMMDD>-PMS5003-<sitename>.json
https://github.com/IISNRL/DCF-PM2.5/blob/master/2020/20200620/20200620-PMS5003-nantou.json
1. SVC - AttributeError
It can happen that joblib fails to predict when model is based on SVR
, for instance:
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/sklearn/svm/_base.py", line 317, in predict
return predict(X)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/sklearn/svm/_base.py", line 335, in _dense_predict
X, self.support_, self.support_vectors_, self._n_support,
AttributeError: 'SVR' object has no attribute '_n_support'
In this case it is beacause in this project model training with scikit-learn 0.23.1
. While models saved using one version of scikit-learn and be loaded in other versions, this may not be supported. Please see sk-learn pickle for more information.
Please update the versions of scikit-learn and its dependencies:
- check deatails of installed package
pip3 show scikit-learn
- upgrade python package
pip3 install -U scikit-learn
*2. Other Error Message?
please let us know if you get any further question.