Skip to content Skip to sidebar Skip to footer

Fast Way To Transpose And Concat Csv Files In Python?

I am trying to transpose multiple files of the same format and concatinating them into 1 big CSV file. I wanted to use numpy for transposing as its a really fast way of doing it bu

Solution 1:

There are various ways of handling headers in genfromtxt. The default is to treat them as part of the data:

In [6]: txt="""time,topic1,topic2,country
   ...: 2015-10-01,20,30,usa
   ...: 2015-10-02,25,35,usa"""

In [7]: data=np.genfromtxt(txt.splitlines(),delimiter=',',skip_header=0)

In [8]: data
Out[8]: 
array([[ nan,  nan,  nan,  nan],
       [ nan,  20.,  30.,  nan],
       [ nan,  25.,  35.,  nan]])

But since the default dtype is float, the strings all appear as nan.

You can treat them as headers - the result is a structured array. The headers now appear in the data.dtype.names list.

In [9]: data=np.genfromtxt(txt.splitlines(),delimiter=',',names=True)

In [10]: data
Out[10]: 
array([(nan, 20.0, 30.0, nan), (nan, 25.0, 35.0, nan)], 
      dtype=[('time', '<f8'), ('topic1', '<f8'), ('topic2', '<f8'), ('country', '<f8')])

With dtype=None, you let it choose the dtype. Based on the strings in the 1st line, it loads everything as S10.

In [11]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None)

In [12]: data
Out[12]: 
array([['time', 'topic1', 'topic2', 'country'],
       ['2015-10-01', '20', '30', 'usa'],
       ['2015-10-02', '25', '35', 'usa']], 
      dtype='|S10')

This matrix can be transposed, and printed or written to a csv file:

In [13]: data.T
Out[13]: 
array([['time', '2015-10-01', '2015-10-02'],
       ['topic1', '20', '25'],
       ['topic2', '30', '35'],
       ['country', 'usa', 'usa']], 
      dtype='|S10')

Since I'm using genfromtxt to load, I could use savetxt to save:

In [26]: with open('test.txt','w') as f:
    np.savetxt(f, data.T, delimiter=',', fmt='%12s')
    np.savetxt(f, data.T, delimiter=';', fmt='%10s') # simulate a 2nd array
   ....:     

In [27]: cat test.txt
        time,  2015-10-01,  2015-10-02
      topic1,          20,          25
      topic2,          30,          35
     country,         usa,         usa
      time;2015-10-01;2015-10-02
    topic1;        20;        25
    topic2;        30;        35
   country;       usa;       usa

Post a Comment for "Fast Way To Transpose And Concat Csv Files In Python?"