In this blog, I’ll summarize a generic way to find out the number of seconds, minutes, days between two time points in Python using the functions provided in the powerful pandas and numpy libraries. Thanks to the brilliant R package reticulate, I am able to run Python codes in R and write this blog comfortably using R markdown.

Preparing the environment

Loading required R libraries

ipak <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  try(sapply(pkg, require, character.only = TRUE), silent = TRUE)
}
packages <- c("reticulate")
ipak(packages)
## Warning: package 'reticulate' was built under R version 4.1.2
## reticulate 
##       TRUE

Set up the R-Python environment

library(reticulate)
# use_python(Sys.which("python"))
# use_condaenv("py3.7", required = TRUE)
use_condaenv("r-reticulate")

Install required Python libraries

py_install("pandas") # install pandas if it is not available
py_install("numpy")

Import required Python libraries

import pandas as pd
import numpy as np

Timestamps in pandas

In pandas, we can access the current time easily via

timestamp_now_UTC = pd.Timestamp.now(tz="UTC")

We can also construct an arbitrary time by setting up the year, month, day, hour, minute and even second

timestamp_UTC = pd.Timestamp(year=2020, month = 8, day = 1, hour = 1, minute = 1, second = 25, tz="UTC")

We can find out the difference between the above two timestamps by

# find out the difference
timestamp_diff = timestamp_now_UTC - timestamp_UTC
print(timestamp_diff)
## 557 days 05:28:08.942866

However, pandas does not provide us an easy way to convert this difference into a number representing this difference in the unit of days, hours and minutes. To this end, we might want to leverage some functions in numpy.

Convert time difference into numbers using numpy

If we want to know the difference in the unit of hours

#
UNIT = "h"
timestamp_diff_h = timestamp_diff.to_numpy().astype("timedelta64" + "[" + UNIT + "]").astype(np.int32)
print(timestamp_diff_h)
#
# timestamp_diff_h_float = timestamp_diff.to_numpy().astype("timedelta64" + "[" + UNIT + "]").astype(np.float32)
# print(timestamp_diff_h_float)
## 13373

If we want to know the difference in the unit of seconds

#
UNIT = "s"
timestamp_diff_s = timestamp_diff.to_numpy().astype("timedelta64" + "[" + UNIT + "]").astype(np.int32)
print(timestamp_diff_s)
#
# timestamp_diff_s_float = timestamp_diff.to_numpy().astype("timedelta64" + "[" + UNIT + "]").astype(np.float32)
# print(timestamp_diff_s_float)
## 48144488