Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
249 views
in Technique[技术] by (71.8m points)

python - Resample data to add missing hour values

Im working with a df that looks like this :

                        trans_id    amount  month   day     hour
2018-08-18 12:59:59+00:00   1         46    8       18       12
2018-08-26 01:56:55+00:00   2         20    8       26       1

I intend to get the average 'amount' at each hour.I use the following code to do that:

df2 = df.groupby(['month', 'day', 'day_name', 'hour'], as_index = False)['amount'].sum()

That gives me the total amount each month day day_name hour combination which is ok. But when I count the total hours for each day they all are not 24 as expected. I imagine due to the fact that some transactions don't exist at that specific (month day day_name hour).

My question is how do i get all 24h irrelevant if they have records or not.

Thanks


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use Series.unstack with DataFrame.stack:

df2 = (df.groupby(['month', 'day', 'day_name', 'hour'])['amount']
         .sum()
         .unstack(fill_value=0)
         .stack()
         .reset_index())

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...