How To Reconstruct A Conversation From Watson Speech-to-Text Output?
I have the JSON output from Watson's Speech-to-Text service that I have converted into a list and then into a Pandas data-frame. I'm trying to identify how to reconstruct the con
Solution 1:
Should be pretty straightforward to load the speaker labels into a Pandas Dataframe for a nice easy graphical view and then identifying the speaker shifts.
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
Output:
from speaker to 0 1 2
0 0.01 0 0.06 said 0.01 0.06
1 0.06 0 0.12 this 0.06 0.12
2 0.12 1 0.15 said 0.12 0.15
3 0.15 1 0.22 that 0.15 0.22
4 0.22 0 0.31 said 0.22 0.31
5 0.31 0 0.45 something 0.31 0.45
6 0.45 0 0.56 else 0.45 0.56
From there, you can ID only speaker shifts and collapse the dataframe with a quick loop
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
You want to take the start point from the first value (hence head) and then the end point from the last vlaue in the temporary dataframe. Additionally, to handle the last speaker case (where you 'd normally get an array out of bounds error, you use a try/catch.
Output:
from to speaker transcript
0 0.01 0.12 0 [said, this]
0 0.12 0.22 1 [said, that]
0 0.22 0.56 0 [said, something, else]
Full Code Here
import json
import pandas as pd
jsonconvo=json.loads("""{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"said",
0.01,
0.06
],
[
"this",
0.06,
0.12
],
[
"said",
0.12,
0.15
],
[
"that",
0.15,
0.22
],
[
"said",
0.22,
0.31
],
[
"something",
0.31,
0.45
],
[
"else",
0.45,
0.56
]
],
"confidence": 0.85,
"transcript": "said this said that said something else "
}
],
"final": true
}
],
"result_index": 0,
"speaker_labels": [
{
"from": 0.01,
"to": 0.06,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.06,
"to": 0.12,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.12,
"to": 0.15,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.15,
"to": 0.22,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.22,
"to": 0.31,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.31,
"to": 0.45,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.45,
"to": 0.56,
"speaker": 0,
"confidence": 0.54,
"final": false
}
]
}""")
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
Post a Comment for "How To Reconstruct A Conversation From Watson Speech-to-Text Output?"