Select_by: snake_case and typehints (refer issue #53) #60

Tanvi-Jain01 · 2023-07-10T09:56:39Z

This PR extends the code of Issue #53 and adds snake_case and typehints in method.
@nipunbatra , @patel-zeel

BEFORE:

CODE:

Lines 16 to 23 in ef99aef

    
           df.index = pd.to_datetime(df.date) 
        
           df = df.drop("date", axis=1) 
        
           df_n = df[year].resample("1D").mean() 
        
           df_n = df_n.fillna(method="ffill") 
        
           df_n["month"] = df_n.index.month 
        
           df_n.index.dayofweek 
        
           print(df_n)

AFTER:

CODE:

import pandas as pd
import numpy as np
    
def select_by(df:pd.Dataframe, year:str, group:list=None, time_period:str='day'):
    """
    Utility function to cut a given dataframe by year and find the average value
    of each day, month, or year. Optionally, data can be grouped by specified columns.
    
    Parameters
    ----------
    df: data frame
        A data frame containing a date field and optional grouping columns.
    year: type string
        A year to select and filter the data.
    group: list, optional
        A list of columns to group the data by. Default is None (no grouping).
    time_period: {'day', 'month', 'year'}, optional
        The time period to compute the average value. Default is 'day'.
     
    Returns
    -------
    data frame
        A data frame with the average value of each day, month, or year.
        If group is specified, the data will be grouped accordingly.
    """
    
    df['date'] = pd.to_datetime(df['date'])
    df_year = df[df['date'].dt.year == int(year)]
    
    if group:
        df_grouped = df_year.groupby(group).resample(time_period[0], on='date').mean(numeric_only=True)
        return df_grouped
    
    if time_period == 'month':
        df_month = df_year.resample('M', on='date').mean(numeric_only=True)
        return df_month
    elif time_period == 'year':
        df_yearly = df_year.resample('Y', on='date').mean(numeric_only=True)
        return df_yearly
    
    df_day = df_year.resample('D', on='date').mean(numeric_only=True)
    return df_day

USAGE:

 df = pd.read_csv("mydata.csv")
select_by(df1,'2022',group=['latitude','longitude','station'], time_period='month')

Additional Time Periods: The modified function introduces the capability to compute the average value of each month or year in addition to daily averages. This provides more granular insights into the data.

Grouping Support: The modified function allows for optional grouping of the data by specified columns. This enables the calculation of average values based on different groups, providing more customized analysis and comparisons.

Resampling Flexibility: The modified function uses the resample method with dynamic frequency parameters based on the selected time period. This allows for greater flexibility in computing average values at different frequencies without hardcoding the resampling periods.

Also changing the name of function from selectByDate to select_by as it now has capability to select the dataframe by day month or year and grouping too.

OUTPUT:

Tanvi-Jain01 added 10 commits June 30, 2023 08:59

enhanced code of scatterPlot(refer issue sustainability-lab#43)

03f403a

timplot: modifying plots using plotly and

a2ea7c7

Adding visualization using plotly

d7d2e6b

modifying the code of googleMaps

fd7c45c

modifying googlemaps sustainability-lab#38

5b48d31

Commit message: timeplot using plotly and subplot error solved

043b730

googleMaps code enhanced and errors solved

273342d

code extended with group and time_period

4087590

applied typehints and camelcase

b53edd1

added typehints and snake_case

9ef916e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select_by: snake_case and typehints (refer issue #53) #60

Select_by: snake_case and typehints (refer issue #53) #60

Tanvi-Jain01 commented Jul 10, 2023

	df.index = pd.to_datetime(df.date)
	df = df.drop("date", axis=1)
	df_n = df[year].resample("1D").mean()
	df_n = df_n.fillna(method="ffill")
	df_n["month"] = df_n.index.month
	df_n.index.dayofweek
	print(df_n)

Select_by: snake_case and typehints (refer issue #53) #60

Are you sure you want to change the base?

Select_by: snake_case and typehints (refer issue #53) #60

Conversation

Tanvi-Jain01 commented Jul 10, 2023

BEFORE:

AFTER: