How To Concatenate Multiple Csv To Xarray And Define Coordinates?
I have multiple csv-files, with the same rows and columns and their contained data varies depending on the date. Each csv-file is affiliated with a different date, listed in its na
Solution 1:
Recall that although it introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, xarray is inspired by and borrows heavily from pandas. So, to answer the question you can proceed as follows.
from glob import glob
import numpy as np
import pandas as pd
# Get the list of all the csv files in data path
csv_flist = glob(data_path + "/*.csv")
df_list = []
for _file in csv_flist:
# get the file name from the data path
file_name = _file.split("/")[-1]
# extract the date from a file name, e.g. "data.2018-06-01.csv"date = file_name.split(".")[1]
# read the read the data in _filedf = pd.read_csv(_file)
# add a column date knowing that all the data in df are recorded at the same datedf["date"] = np.repeat(date, df.shape[0])
df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format# append df to df_list
df_list.append(df)
Let's check e.g. the first df in df_list
print(df_list[0])
status user_id weight date0 healthy 1722019-06-011 obese 21032019-06-01Concatenate all the dfs along axis=0
df_all = pd.concat(df_list, ignore_index=True).sort_index()
print(df_all)
status user_id weight date0 healthy 1722019-06-011 obese 21032019-06-012 healthy 1702018-06-013 healthy 2902018-06-01Set the index of df_all to a multiIndex of two levels with levels[0] = "date" and levels[1]="user_id".
data = df_all.set_index(["date", "user_id"]).sort_index()
print(data)
status weight
date user_id
2018-06-011 healthy 702 healthy 902019-06-011 healthy 722 obese 103Subsequently, you can convert the resulting pandas.DataFrame into an xarray.Dataset using .to_xarray() as follows.
xds = data.to_xarray()
print(xds)
<xarray.Dataset>
Dimensions: (date: 2, user_id: 2)
Coordinates:
* date (date) datetime64[ns] 2018-06-012019-06-01
* user_id (user_id) int64 12
Data variables:
status (date, user_id) object 'healthy''healthy''healthy''obese'
weight (date, user_id) int64 709072103Which will fully answer the question.
Solution 2:
Try these:
import glob
import pandas as pd
path=(r'ur file')
all_file = glob.glob(path + "/*.csv")
li = []
for filename in all_file:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
Post a Comment for "How To Concatenate Multiple Csv To Xarray And Define Coordinates?"