How To Concatenate Multiple Csv To Xarray And Define Coordinates?
I have multiple csv-files, with the same rows and columns and their contained data varies depending on the date. Each csv-file is affiliated with a different date, listed in its na
Solution 1:
Recall that although it introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy
-like arrays, xarray
is inspired by and borrows heavily from pandas
. So, to answer the question you can proceed as follows.
from glob import glob
import numpy as np
import pandas as pd
# Get the list of all the csv files in data path
csv_flist = glob(data_path + "/*.csv")
df_list = []
for _file in csv_flist:
# get the file name from the data path
file_name = _file.split("/")[-1]
# extract the date from a file name, e.g. "data.2018-06-01.csv"date = file_name.split(".")[1]
# read the read the data in _filedf = pd.read_csv(_file)
# add a column date knowing that all the data in df are recorded at the same datedf["date"] = np.repeat(date, df.shape[0])
df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format# append df to df_list
df_list.append(df)
Let's check e.g. the first df
in df_list
print(df_list[0])
status user_id weight date0 healthy 1722019-06-011 obese 21032019-06-01
Concatenate all the df
s along axis=0
df_all = pd.concat(df_list, ignore_index=True).sort_index()
print(df_all)
status user_id weight date0 healthy 1722019-06-011 obese 21032019-06-012 healthy 1702018-06-013 healthy 2902018-06-01
Set the index of df_all
to a multiIndex of two levels with levels[0] = "date"
and levels[1]="user_id"
.
data = df_all.set_index(["date", "user_id"]).sort_index()
print(data)
status weight
date user_id
2018-06-011 healthy 702 healthy 902019-06-011 healthy 722 obese 103
Subsequently, you can convert the resulting pandas.DataFrame
into an xarray.Dataset
using .to_xarray()
as follows.
xds = data.to_xarray()
print(xds)
<xarray.Dataset>
Dimensions: (date: 2, user_id: 2)
Coordinates:
* date (date) datetime64[ns] 2018-06-012019-06-01
* user_id (user_id) int64 12
Data variables:
status (date, user_id) object 'healthy''healthy''healthy''obese'
weight (date, user_id) int64 709072103
Which will fully answer the question.
Solution 2:
Try these:
import glob
import pandas as pd
path=(r'ur file')
all_file = glob.glob(path + "/*.csv")
li = []
for filename in all_file:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
Post a Comment for "How To Concatenate Multiple Csv To Xarray And Define Coordinates?"