最終更新:2024-12-01 (日) 08:00:54 (7d)  

julius
Top / julius

speech recognition engine

メモ

  • -smpFreq

help (4.6)

  • Julius rev.4.6 - based on JuliusLib rev.4.6 (fast)
    
    Engine specification:
     -  Base setup   : fast
     -  Supported LM : DFA, N-gram, Word
     -  Extension    : LibSndFile
     -  Compiled by  : gcc -g -O2 -fPIC
    
    Options:
    
    --- Global Options -----------------------------------------------
    
     Feature Vector Input:
        [-input devname]       input source  (default = htkparam)
             htkparam/mfcfile  feature vectors in HTK parameter file format
             outprob           outprob vectors in HTK parameter file format
             vecnet            receive vectors from client (TCP/IP)
        [-filelist file]    filename of input file list
    
     Speech Input:
        (Can extract MFCC/FBANK/MELSPEC features from waveform)
        [-input devname]    input source  (default = htkparam)
             file/rawfile      waveform file (RAW(BE),WAV,AU,SND,NIST,ADPCM and more)
             mic               default microphone device
             alsa              use ALSA interface
             adinnet           adinnet client (TCP/IP)
             stdin             standard input
        [-filelist file]    filename of input file list
        [-adport portnum]   adinnet port number to listen         (5530)
        [-48]               enable 48kHz sampling with internal down sampler (OFF)
        [-zmean/-nozmean]   enable/disable DC offset removal      (OFF)
        [-lvscale]          input level scaling factor (1.0: OFF) (1.0)
        [-nostrip]          disable stripping off zero samples
        [-record dir]       record triggered speech data to dir
        [-rejectshort msec] reject an input shorter than specified
        [-rejectlong msec]  reject an input longer than specified
    
     Speech Detection: (default: on=mic/net off=files)
        [-cutsilence]       turn on (force) skipping long silence
        [-nocutsilence]     turn off (force) skipping long silence
        [-lv unsignedshort] input level threshold (0-32767)       (2000)
        [-zc zerocrossnum]  zerocross num threshold per sec.      (60)
        [-headmargin msec]  header margin length in msec.         (300)
        [-tailmargin msec]  tail margin length in msec.           (400)
        [-chunksize sample] unit length for processing            (1000)
        [-fvad]             FVAD sw (-1=off, 0-3=on / degree      (-1)
        [-fvad_param i f]   FVAD parameter (dur/thres)            (5 0.50)
    
     GMM utterance verification:
        -gmm filename       GMM definition file
        -gmmnum num         GMM Gaussian pruning num              (10)
        -gmmreject string   comma-separated list of noise model name to reject
    
     On-the-fly Decoding: (default: on=mic/net off=files)
        [-realtime]         turn on, input streamed with MAP-CMN
        [-norealtime]       turn off, input buffered with sentence CMN
    
     Others:
        [-C jconffile]      load options from jconf file
        [-quiet]            reduce output to only word string
        [-demo]             equal to "-quiet -progout"
        [-debug]            (for debug) dump numerous log
        [-callbackdebug]    (for debug) output message per callback
        [-check (wchmm|trellis)] (for debug) check internal structure
        [-check triphone]   triphone mapping check
        [-outprobout file]  Output state probabilities to file
        [-setting]          print engine configuration and exit
        [-help]             print this message and exit
    
    --- Instance Declarations ----------------------------------------
    
        [-AM]               start a new acoustic model instance
        [-LM]               start a new language model instance
        [-SR]               start a new recognizer (search) instance
        [-AM_GMM]           start an AM feature instance for GMM
        [-GLOBAL]           start a global section
        [-nosectioncheck]   disable option location check
    
    --- Acoustic Model Options (-AM) ---------------------------------
    
     Acoustic analysis:
        [-htkconf file]     load parameters from the HTK Config file
        [-smpFreq freq]     sample period (Hz)                    (16000)
        [-smpPeriod period] sample period (100ns)                 (625)
        [-fsize sample]     window size (sample)                  (400)
        [-fshift sample]    frame shift (sample)                  (160)
        [-preemph]          pre-emphasis coef.                    (0.97)
        [-fbank]            number of filterbank channels         (24)
        [-ceplif]           cepstral liftering coef.              (22)
        [-rawe] [-norawe]   toggle using raw energy               (no)
        [-enormal] [-noenormal] toggle normalizing log energy     (no)
        [-escale]           scaling log energy for enormal        (1.0)
        [-silfloor]         energy silence floor in dB            (50.0)
        [-delwin frame]     delta windows length (frame)          (2)
        [-accwin frame]     accel windows length (frame)          (2)
        [-hifreq freq]      freq. of upper band limit, off if <0  (-1)
        [-lofreq freq]      freq. of lower band limit, off if <0  (-1)
        [-sscalc]           do spectral subtraction (file input only)
        [-sscalclen msec]   length of head silence for SS (msec)  (300)
        [-ssload filename]  load constant noise spectrum from file for SS
        [-ssalpha value]    alpha coef. for SS                    (2.000000)
        [-ssfloor value]    spectral floor for SS                 (0.500000)
        [-zmeanframe/-nozmeanframe] frame-wise DC removal like HTK(OFF)
        [-usepower/-nousepower] use power in fbank analysis       (OFF)
        [-cmnload file]     load initial CMN/CVN param from file on startup
        [-cmnsave file]     save CMN/CVN param to file after each input
        [-cmnstatic]        no MAP, use static CMN/CVN (use with -cmnload)
        [-cvnstatic]        use static CVN only (use with -cmnload)
        [-cmnnoupdate]      not update initial param while recog. (use with -cmnload)
        [-cmnmapweight]     weight value of initial cm for MAP-CMN (100.00)
        [-cvn]              cepstral variance normalisation       (on)
        [-vtln alpha lowcut hicut] enable VTLN (1.0 to disable)   (1.000000)
    
     Acoustic Model:
        -h hmmdefsfile      HMM definition file name
        [-hlist HMMlistfile] HMMlist filename (must for triphone model)
        [-dnnconf file]     DNN configuration file
        [-iwcd1 methodname] switch IWCD triphone handling on 1st pass
                 best N     use N best score (default of n-gram, N=3)
                 max        use maximum score
                 avg        use average score (default of dfa)
        [-force_ccd]        force to handle IWCD
        [-no_ccd]           don't handle IWCD
        [-notypecheck]      don't check input parameter type
        [-spmodel HMMname]  name of short pause model             ("sp")
        [-multipath]        switch decoding for multi-path HMM    (auto)
    
     Acoustic Model Computation Method:
        [-gprune methodname] select Gaussian pruning method:
                 safe          safe pruning
                 heuristic     heuristic pruning
                 beam          beam pruning (default for TM/PTM)
                 none          no pruning (default for non tmix models)
        [-tmix gaussnum]    Gaussian num threshold per mixture for pruning (2)
        [-gshmm hmmdefs]    monophone hmmdefs for GS
        [-gsnum N]          N-best state will be selected        (24)
    
    --- Language Model Options (-LM) ---------------------------------
    
     N-gram:
        -d file.bingram     n-gram file in Julius binary format
        -nlr file.arpa      forward n-gram file in ARPA format
        -nrl file.arpa      backward n-gram file in ARPA format
        [-lmp float float]  weight and penalty (tri: 8.0 -2.0 mono: 5.0 -1)
        [-lmp2 float float]       for 2nd pass (tri: 8.0 -2.0 mono: 6.0 0)
        [-transp float]     penalty for transparent word (+0.0)
    
     DFA Grammar:
        -dfa file.dfa       DFA grammar file
        -gram file[,file2...] (list of) grammar prefix(es)
        -gramlist filename  filename of grammar list
        [-penalty1 float]   word insertion penalty (1st pass)     (0.0)
        [-penalty2 float]   word insertion penalty (2nd pass)     (0.0)
    
     Word Dictionary for N-gram and DFA:
        -v dictfile         dictionary file name
        [-silhead wordname] (n-gram) beginning-of-sentence word   (<s>)
        [-siltail wordname] (n-gram) end-of-sentence word         (</s>)
        [-mapunk wordname]  (n-gram) map unknown words to this    (<unk>)
        [-forcedict]        ignore error entry and keep running
        [-iwspword]         (n-gram) add short-pause word for inter-word CD sp
        [-iwspentry entry]  (n-gram) word entry for "-iwspword" (<UNK> [sp] sp sp)
        [-adddict dictfile] (n-gram) load extra dictionary
        [-addentry entry]   (n-gram) load extra word entry
    
     Isolated Word Recognition:
        -w file[,file2...]  (list of) wordlist file name(s)
        -wlist filename     file that contains list of wordlists
        -wsil head tail sp  name of silence/pause model
                              head - BOS silence model name       (silB)
                              tail - EOS silence model name       (silE)
                               sp  - their name as context or "NULL" (NULL)
    
    --- Recognizer / Search Options (-SR) ----------------------------
    
     Search Parameters for the First Pass:
        [-b beamwidth]      beam width (by state num)             (guessed)
                            (0: full search, -1: force guess)
        [-bs score_width]   beam width (by score offset)          (disabled)
                            (-1: disable)
        [-sepnum wordnum]   (n-gram) # of hi-freq word isolated from tree (150)
        [-1pass]            do 1st pass only, omit 2nd pass
        [-inactive]         recognition process not active on startup
    
     Search Parameters for the Second Pass:
        [-b2 hyponum]       word envelope beam width (by hypo num) (30)
        [-n N]              # of sentence to find                 (1)
        [-output N]         # of sentence to output               (1)
        [-sb score]         score beam threshold (by score)       (80.0)
        [-s hyponum]        global stack size of hypotheses       (500)
        [-m hyponum]        hypotheses overflow threshold num     (2000)
        [-lookuprange N]    frame lookup range in word expansion  (5)
        [-looktrellis]      (dfa) expand only backtrellis words
        [-[no]multigramout] (dfa) output per-grammar results
        [-oldtree]          (dfa) use old build_wchmm()
        [-oldiwcd]          (dfa) use full lcdset
        [-iwsp]             insert sp for all word end (multipath)(off)
        [-iwsppenalty]      trans. penalty for iwsp (multipath)   (-1.0)
    
     Short-pause Segmentation:
        [-spsegment]        enable short-pause segmentation
        [-spdur]            length threshold of sp frames         (10)
        [-pausemodels str]  comma-delimited list of pause models for segment
    
     Graph Output with graph-oriented search:
        [-lattice]          enable word graph (lattice) output
        [-confnet]          enable confusion network output
        [-nolattice]][-noconfnet] disable lattice / confnet output
        [-graphrange N]     merge same words in graph (0)
                            -1: not merge, leave same loc. with diff. score
                             0: merge same words at same location
                            >0: merge same words around the margin
        [-graphcut num]     graph cut depth at postprocess (-1: disable)(80)
        [-graphboundloop num] max. num of boundary adjustment loop (20)
        [-graphsearchdelay] inhibit search termination until 1st sent. found
        [-nographsearchdelay] disable it (default)
    
     Forced Alignment:
        [-walign]           optionally output word alignments
        [-palign]           optionally output phoneme alignments
        [-salign]           optionally output state alignments
    
     Minimum Bayes Risk Decoding:
        [-mbr]              enable rescoring sentence on MBR(WER)
        [-mbr_wwer]         enable rescoring sentence on MBR(WWER)
        [-nombr]            disable rescoring sentence on MBR
        [-mbr_weight float float] score and loss func. weight on MBR (0.1 1.0)
    
     Confidence Score:
        [-cmalpha value]    CM smoothing factor                    (0.050000)
    
     Message Output:
        [-fallback1pass]    use 1st pass result when search failed
        [-progout]          progressive output in 1st pass
        [-proginterval]     interval of progout in msec           (300)
    
    -------------------------------------------------
    
     Additional options for application:
        [--help]    display this help
        [-help]     display this help
        [-outfile]  save result in separate .out file
        [-nolog]    not output any log
        [-logfile arg]      output log to file
        [-noxmlescape]      disable XML escape
        [-separatescore]    output AM and LM scores separately
        [-kanji arg]        convert character set for output
        [-nocharconv]       disable charconv
        [-charconv arg arg] convert character set for output
        [-outcode arg]      select info to output to the module: WLPSCwlps
        [-module (arg)]     run as a server module
        [-record arg]       record input waveform to file in dir

Julius 4.2.2? (Ubuntu)

  • accept_check?
  • adinrec?
  • adintool?
  • dfa_determinize?
  • dfa_minimize?
  • jclient?
  • jcontrol?
  • julius
  • julius-generate?
  • julius-generate-ngram?
  • mkbingram?
  • mkbinhmm?
  • mkbinhmmlist?
  • mkdfa?
  • mkfa?
  • mkgshmm?
  • mkss?
  • nextword?
  • yomi2voca?