java - CMUSphinx never recognizes any word from audio files -
sphinx doesn't seem recognize or process audio files accepts audio stream spits out empty array(speechresult result). feel there isn't issues audio file i'm using because i've tried several , doesn't work on of them. have audio file know works? , there stands out causing stream not produce transcription?
public static void main(string args[]) throws ioexception { configuration configuration = new configuration(); configuration.setacousticmodelpath("resource:/edu/cmu/sphinx/models/en-us/en-us"); configuration.setdictionarypath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict"); configuration.setlanguagemodelpath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp"); streamspeechrecognizer recognizer = new streamspeechrecognizer(configuration); //recognizer.startrecognition(new fileinputstream("e:/1video/hello-5.mp3")); file file = new file("e:/1video/bargain_not.wav"); fileinputstream fis = new fileinputstream(file); inputstream = new fileinputstream(file); //is = automaticspeechrecognition.class.getresourceasstream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav"); recognizer.startrecognition(is); speechresult result = null; while((result = recognizer.getresult()) != null) { system.out.println(result.getresult()); system.out.println(result.gethypothesis()); system.out.println(result.getwords()); } //result = recognizer.getresult(); //system.out.println(result); //system.out.println(result.tostring()); //system.out.println(result.getwords()); /*for (wordresult wordresult : result.getwords()) { system.out.println(wordresult); }*/ recognizer.stoprecognition(); }
here's output running -- doesn't seem have failures
09:31:13.430 info unitmanager ci unit: *+nsn+ 09:31:13.433 info unitmanager ci unit: *+spn+ 09:31:13.433 info unitmanager ci unit: aa 09:31:13.434 info unitmanager ci unit: ae 09:31:13.434 info unitmanager ci unit: ah 09:31:13.434 info unitmanager ci unit: ao 09:31:13.434 info unitmanager ci unit: aw 09:31:13.434 info unitmanager ci unit: ay 09:31:13.434 info unitmanager ci unit: b 09:31:13.434 info unitmanager ci unit: ch 09:31:13.434 info unitmanager ci unit: d 09:31:13.434 info unitmanager ci unit: dh 09:31:13.434 info unitmanager ci unit: eh 09:31:13.435 info unitmanager ci unit: er 09:31:13.435 info unitmanager ci unit: ey 09:31:13.435 info unitmanager ci unit: f 09:31:13.435 info unitmanager ci unit: g 09:31:13.435 info unitmanager ci unit: hh 09:31:13.435 info unitmanager ci unit: ih 09:31:13.435 info unitmanager ci unit: iy 09:31:13.435 info unitmanager ci unit: jh 09:31:13.435 info unitmanager ci unit: k 09:31:13.435 info unitmanager ci unit: l 09:31:13.435 info unitmanager ci unit: m 09:31:13.436 info unitmanager ci unit: n 09:31:13.436 info unitmanager ci unit: ng 09:31:13.436 info unitmanager ci unit: ow 09:31:13.436 info unitmanager ci unit: oy 09:31:13.436 info unitmanager ci unit: p 09:31:13.436 info unitmanager ci unit: r 09:31:13.436 info unitmanager ci unit: s 09:31:13.436 info unitmanager ci unit: sh 09:31:13.436 info unitmanager ci unit: t 09:31:13.436 info unitmanager ci unit: th 09:31:13.436 info unitmanager ci unit: uh 09:31:13.437 info unitmanager ci unit: uw 09:31:13.437 info unitmanager ci unit: v 09:31:13.437 info unitmanager ci unit: w 09:31:13.437 info unitmanager ci unit: y 09:31:13.437 info unitmanager ci unit: z 09:31:13.437 info unitmanager ci unit: zh 09:31:14.014 info autocepstrum cepstrum component auto-configured follows: autocepstrum {melfrequencyfilterbank, denoise, discretecosinetransform2, lifter} 09:31:14.030 info dictionary loading dictionary from: jar:file:/c:/users/kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-snapshot/sphinx4-data-1.0-snapshot.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict 09:31:14.132 info dictionary loading filler dictionary from: jar:file:/c:/users/kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-snapshot/sphinx4-data-1.0-snapshot.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict 09:31:14.132 info acousticmodelloader loading tied-state acoustic model from: jar:file:/c:/users/kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-snapshot/sphinx4-data-1.0-snapshot.jar!/edu/cmu/sphinx/models/en-us/en-us 09:31:14.133 info acousticmodelloader pool means entries: 16128 09:31:14.133 info acousticmodelloader pool variances entries: 16128 09:31:14.133 info acousticmodelloader pool transition_matrices entries: 42 09:31:14.133 info acousticmodelloader pool senones entries: 5126 09:31:14.133 info acousticmodelloader gaussian weights: mixture_weights. entries: 15378 09:31:14.133 info acousticmodelloader pool senones entries: 5126 09:31:14.133 info acousticmodelloader context independent unit entries: 42 09:31:14.133 info acousticmodelloader hmm manager: 137095 hmms 09:31:14.134 info acousticmodel compositesenonesequences: 0 09:31:14.134 info largetrigrammodel loading n-gram language model from: jar:file:/c:/users/kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-snapshot/sphinx4-data-1.0-snapshot.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.dmp 09:31:14.807 info largetrigrammodel 1-grams: 19794 09:31:14.807 info largetrigrammodel 2-grams: 1377200 09:31:14.807 info largetrigrammodel 3-grams: 3178194 09:31:15.582 info lextreelinguist max ci units 43 09:31:15.583 info lextreelinguist unit table size 79507 09:31:15.585 info speedtracker # ----------------------------- timers---------------------------------------- 09:31:15.585 info speedtracker # name count curtime mintime maxtime avgtime tottime 09:31:15.586 info speedtracker load dictionary 1 0.1020s 0.1020s 0.1020s 0.1020s 0.1020s 09:31:15.586 info speedtracker load lm 1 0.6730s 0.6730s 0.6730s 0.6730s 0.6730s 09:31:15.586 info speedtracker compile 1 0.7760s 0.7760s 0.7760s 0.7760s 0.7760s 09:31:15.586 info speedtracker load 1 1.5450s 1.5450s 1.5450s 1.5450s 1.5450s 09:31:15.608 info speedtracker time audio: 1.94s proc: 0.01s speed: 0.00 x real time 09:31:15.608 info speedtracker total time audio: 1.94s proc: 0.01s 0.00 x real time 09:31:15.609 info memorytracker mem total: 454.75 mb free: 262.35 mb 09:31:15.609 info memorytracker used: this: 192.40 mb avg: 192.40 mb max: 192.40 mb 09:31:15.610 info largetrigrammodel lm cache size: 0 hits: 0 misses: 0 <s> </s>
like nikolay shmyrev said file must 16khz 16bit mono mswav. such file can recorded audacity.
file export , make sure pick wav (microsoft) signed 16 bit pcm.
Comments
Post a Comment