Original Publish Date: 21 May 2015
AGD: Audio Generation Device
The basic idea is that the AG Device connects to the physical handset and can listen, record and process as well as generate audio through the connected handset.
- The Cyara call engine sends known commands and requests to the AG Device by playing tones of differing frequencies;
- The AG Device while idle plays an alive tone every 10 seconds to alert the call engine it is sitting idle and alive;
- Identical audio reference files are stored on both the call engine, and remotely on the AG Device. This allows us to utilize the most accurate analysis method known as full reference (FR), since we make use of the original reference file to compare samples against the captured degraded utterance.
The AGD locally stores up to 30MB worth of log files showing basic processing information, and also stores up to 100MB of the captured utterance audio (approximately the last 700 utterances) including the MOS result in the filename. These can be extracted for analysis using the Admin Tool while onsite.
When the call engine requests the reference file played, the AGD plays the stored reference file to the call engine, which then captures the received degraded utterance, and compares the quality to the reference file it has stored returning a MOS result. This is an indication of the degradation from the agent location to the customer - Customer Experience.
When the call engine commands the AG Device to prepare to receive degraded audio sample, it then plays the stored reference file down to the agent handset. The AGD will capture the degraded audio, and compare this utterance quality to the reference file it has stored returning a MOS result. This is an indication of the degradation from the customer to the agent location - Agent Experience.
After processing the utterance and calculating the MOS result, the AGD will speak the result back to the call engine for capture. It will also play back the captured degraded audio for the call engine to store for high level debugging remotely.
Figure 1: AGD call flow example
AGD Main Features
- Persistent volume to avoid tampering
- 6 alternate reference files (male and female)
- Bi-directional MOS results, Agent Experience and Customer Experience
- Resilient operation such as auto-boot and startup on power connected
- Compatible* with Polaris Soundshield acoustic shock protection devices. Due to inconsistencies across devices, support for the Soundshield device is a case by case basis.
- Dedicated supported handset model with auto answer enabled
- Cyara developed custom cable for the particular handset in use to connect the AGD to the handset Direct power to the android device (not USB power as it can create noise and incorrect MOS results)
- A solution to direct calls to the handset, either directly or via intelligent control.
- Power-off the Phone
- Connect the AGD with the provide cable into the Headset socket
- Power-on the Phone
- Make sure that phone is set to auto-answer
- Make sure that headset indicator is on
The importance of volume
The volume settings on both the physical handset and the AG Device are important for the accurate MOS results. If the volume is incorrect on the AG Device, the Customer Experience MOS result calculated by the call engine will be impacted. If the volume on the physical handset is not set correctly, the Agent Experience MOS result calculated by the AGD will be impacted.
Below are the values we have found during testing to give the most accurate results.
Figure 2: Standard Phone Volume Settings
It is likely other handsets are also compatible, especially within the same series. This list is only the models that have been confirmed.
* Polaris Soundshield is only supported on a case by case basis due to inconsistencies across the devices.
Understanding AGD Remote Commands:
AGD has hardcoded commands, which are essentially certain frequencies linked with a specific action. The list of these frequencies is below:
- REQUEST-GET-VOLUME : 1400Hz : AGD will respond with the current volume level of the AGD
- REQUEST-VOLUME-UP : 1300Hz : AGD will increase the persistent volume by 1 step
- REQUEST-VOLUME-DOWN : 1350Hz : AGD will decrease the persistent volume by 1 step
- PREPARE-FOR-REFERENCE : 1000Hz : AGD will prepare to capture then process against the default reference audio reference.wav
- PREPARE-FOR-REFERENCE-A : 1050Hz : AGD will prepare to capture then process against the first alternate reference audio reference-alt-a.wav
- PREPARE-FOR-REFERENCE-B : 1075Hz : AGD will prepare to capture then process against the second alternate reference audio reference-alt-b.wav
- PREPARE-FOR-REFERENCE-C : 1150Hz : AGD will prepare to capture then process against the third alternate reference audio reference-alt-c.wav
- PREPARE-FOR-REFERENCE-D : 1200Hz : AGD will prepare to capture then process against the fourth alternate reference audio reference-alt-d.wav
- PREPARE-FOR-REFERENCE-E : 2045Hz : AGD will prepare to capture then process against the fifth alternate reference audio reference-alt-e.wav
- END-OF-REFERENCE : 1500Hz : AGD will stop capturing the degraded audio and process the captured utterance. AGD will respond with MOS result and replay the captured degraded audio
- PLAY-REFERENCE : 1750Hz : AGD will play the default reference audio reference.wav to be captured and processed for degradation by the call engine.
- PLAY-REFERENCE-A : 1800Hz : AGD will play the first alternate reference audio reference- alt-a.wav to be captured and processed for degradation by the call engine.
- PLAY-REFERENCE-B : 1850Hz : AGD will play the second alternate reference audio reference-alt-b.wav to be captured and processed for degradation by the call engine.
- PLAY-REFERENCE-C : 1900Hz : AGD will play the third alternate reference audio reference- alt-c.wav to be captured and processed for degradation by the call engine.
- PLAY-REFERENCE-D : 1950Hz : AGD will play the fourth alternate reference audio reference- alt-d.wav to be captured and processed for degradation by the call engine.
- PLAY-REFERENCE-E : 2000Hz : AGD will play the fifth alternate reference audio reference- alt-e.wav to be captured and processed for degradation by the call engine.
- REQUEST-REBOOT-SOFT : 2100Hz : For administration use only, the AGD will attempt to close off running processes and reboot.
- REQUEST-REBOOT-HARD : 2150Hz : For administration use only, the AGD will try to immediately reboot.
Please note due to the Polaris Soundshield Shriek Rejection software, we have a reduced window of allowed tones to utilise under this configuration. This has reduced the number of understood tones when using the Soundshield. The SOUNDSHIELD folder under TONES in the sample test cases only contains allowed tones expected to make it through the Polaris DSP layer.
High Level Flows:
Figure 3: Customer Experience Scoring Diagram
Figure 4: Agent Experience Scoring Diagram
PESQ: Perceptual Evaluation of Speech Quality, is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. It is standardised as ITU-T recommendation P.862 (02/01). Today, PESQ is a worldwide applied industry standard for objective voice quality testing used by phone manufacturers, network equipment vendors and telecom operators. Its usage requires a license.
Genealogy of related standards
ITU-T’s family of full reference objective voice quality measurements started in 1997 with P.861 (PSQM), which was superseded by P.862 (PESQ) in 2001. P.862 was later complemented with the recommendations P.862.1 (mapping of PESQ scores to a MOS scale), P.862.2 (wideband measurements) and P.862.3 (application guide). Since 2011 P.863 (POLQA) is in force. Two additional implementer’s guides for P.863 have been consented by ITU-T Study Group 12 in November 2011. In addition to the above listed full reference methods, the list of ITU-T’s objective voice quality measurement standards also includes P.563 (no-reference algorithm).
A "full reference" (FR) algorithm has access to and makes use of the original reference signal for a comparison (i.e. a difference analysis). It can compare each sample of the reference signal (talker side) to each corresponding sample of the degraded signal (listener side). FR measurements deliver the highest accuracy and repeatability but can only be applied for dedicated tests in live networks (e.g. drive test tools for mobile network benchmarks).
PESQ is a full-reference algorithm and analyzes the speech signal sample-by-sample after a temporal alignment of corresponding excerpts of reference and test signal. PESQ can be applied to provide an end-to-end (E2E) quality assessment for a network, or characterize individual network components.
PESQ results principally model mean opinion scores (MOS) that cover a scale from 1 (bad) to 5 (excellent). A mapping function to MOS-LQO is outlined under P.862.1.