Pandas Read_csv Not Obeying A Regex Sep
Data: from io import StringIO import pandas as pd s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last 375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 0
Solution 1:
Let's look at this SO Post.
Use this regular expression, r',(?=\S)'
explained above.
from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''
df = pd.read_csv(StringIO(s), sep=r',(?=\S)')
Output:
ID Level QID Text \
375280046 S D3M Which is your favorite? D5M0 option 1
S D3M How often? (at home, at work, other) D3M0 Work
M A78 Do you prefer a, b, or c? A78C a
376918925 M A78 Which ONE (selectonlyone) A78E Milk
ResponseID responseText date_key last375280046 S 2012-08-080000 ynot
S 2010-03-310000 okkk
M 2010-03-310000 abc
376918925 M 2004-02-020000 launch Wed.,
Solution 2:
read_csv
appears to be stripping the space from the end of the string prior to attempting to identify the separator. This can be worked around by modifying the regex to also check for commas identified as just prior to the end of the file:
pd.read_csv(StringIO(s), sep=r',(?!\s|\Z)', engine='python')
Out[347]:
ID Level QID Text ResponseID \
0375280046 S D3M Which is your favorite? D5M0
1375280046 S D3M How often? (at home, at work, other) D3M0
2375280046 M A78 Do you prefer a, b, or c? A78C
3376918925 M A78 Which ONE (selectonlyone) A78E
responseText date_key last0 option 12012-08-0800:00:00 ynot
1 Work 2010-03-3100:00:00 okkk
2 a 2010-03-3100:00:00 abc
3 Milk 2004-02-0200:00:00 launch Wed.,
Post a Comment for "Pandas Read_csv Not Obeying A Regex Sep"