Skip to content Skip to sidebar Skip to footer

How To Concatenate Multiple Csv To Xarray And Define Coordinates?

I have multiple csv-files, with the same rows and columns and their contained data varies depending on the date. Each csv-file is affiliated with a different date, listed in its na

Solution 1:

Recall that although it introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, xarray is inspired by and borrows heavily from pandas. So, to answer the question you can proceed as follows.

from glob import glob
import numpy as np
import pandas as pd

# Get the list of all the csv files in data path
csv_flist = glob(data_path + "/*.csv") 

df_list = []
for _file in csv_flist:
    # get the file name from the data path
    file_name = _file.split("/")[-1]
    
    # extract the date from a file name, e.g. "data.2018-06-01.csv"date = file_name.split(".")[1]
    
    # read the read the data in _filedf = pd.read_csv(_file)
    
    # add a column date knowing that all the data in df are recorded at the same datedf["date"] = np.repeat(date, df.shape[0])
    df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format# append df to df_list
    df_list.append(df)

Let's check e.g. the first df in df_list

print(df_list[0])

    status  user_id  weight       date0  healthy        1722019-06-011    obese        21032019-06-01

Concatenate all the dfs along axis=0

df_all = pd.concat(df_list, ignore_index=True).sort_index()
print(df_all)

    status  user_id  weight       date0  healthy        1722019-06-011    obese        21032019-06-012  healthy        1702018-06-013  healthy        2902018-06-01

Set the index of df_all to a multiIndex of two levels with levels[0] = "date" and levels[1]="user_id".

data = df_all.set_index(["date", "user_id"]).sort_index()
print(data)

                     status  weight
date       user_id                 
2018-06-011        healthy      702        healthy      902019-06-011        healthy      722          obese     103

Subsequently, you can convert the resulting pandas.DataFrame into an xarray.Dataset using .to_xarray() as follows.

xds = data.to_xarray()
print(xds)

<xarray.Dataset>
Dimensions:  (date: 2, user_id: 2)
Coordinates:
  * date     (date) datetime64[ns] 2018-06-012019-06-01
  * user_id  (user_id) int64 12
Data variables:
    status   (date, user_id) object 'healthy''healthy''healthy''obese'
    weight   (date, user_id) int64 709072103

Which will fully answer the question.

Solution 2:

Try these:

import glob
    import pandas as pd

    path=(r'ur file')
    all_file = glob.glob(path + "/*.csv")
    li = []
    for filename in all_file:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)
    frame = pd.concat(li, axis=0, ignore_index=True)

Post a Comment for "How To Concatenate Multiple Csv To Xarray And Define Coordinates?"