Category Archives: Enhanced Voice

Surveys and Mediocracy

Strong Auth Drives Conversational Access

When I’m wearing my analyst hat, Iâ€™m constantly asked if â€œthis the year forâ€¦â€ Is it the year for VoiceXML? The year for Speech Recognition? The year for speaker verification/voice biometrics? The year for VoIP? For the past year, Iâ€™d answer every question the same way, â€œ2007 should be a big yearâ€ because the robustness of the technology, combined with a maturity of the vendors in the Conversational Access Technologies (CAT) arena lent towards the adoption of all of these technologies.

I still believe that 2007 is the year when we do turn that corner, hit the end of the runway and take off, cross the chasm and meet up with every other business clichÃ© that describes what happens when the latent need for solutions breaks through the fear factor of being an early adopter and sales start to ramp up. However, next yearâ€™s growth is not due the technology or the vendors, or even cost avoidance. The next yearâ€™s growth will be based on meeting federal mandates such as FFIEC.

The first generation of Conversational Access Technologies were found in the financial industry, which brought us the first widespread use of IVRs for handling self service for credit cards. Itâ€™s the financial industry that will also drive adoption of the next generation of CAT.

The trigger, as mentioned in previous reports and advisories authored by (and for) Opus Research, is the FFIEC guidance. The guidance stated that in 2006, financial institutions needed to implement multi-factor authentication for the web. In 2007, this extends into telephony channels as well.

Early implementers of multi-factor security at banks primarily went down one of two paths: One-Time-Password generating tokens and Shared Secrets.

One-Time-Password generating tokens were obvious for many banks, as internally they have been used for years to restrict internal access to secure platforms. Solutions such as RSAâ€™s SecurID and Verisignâ€™s VIP generate a new numeric PIN every 60 seconds. A user would log into a website with their UserID and password, then enter the generated PIN and get access. Itâ€™s a very straightforward solution, though it has been considered expensive as each user needs to get their own token which displays the one-time password. RSA is considered the market leader in hardware based OTP technology.

Shared â€œSecretâ€ Information makes up the other predominant solution for handling verification. There are three major categories of shared secrets:

Self-Supplied Secrets: in this case, the system asks you, at the point of registration, to answer a number of questions (What city were you born in, what is your favorite color) and at login, you will be asked to answer one or more of these questions.
Historical Data: in this case, the system uses historical information ranging from â€œwhat was the amount of your last depositâ€ to â€œwhen did you pay off your car loanâ€ to â€œwhat was your address in January 2001, gleaned from a number of public and internal databases. You donâ€™t pre-answer any question.
Photo Preferences: also pushed to market by RSA as a result of its PassMark acquisition, this method has you pick a preferred photo out of a selection of up to thousands of photos and at login, youâ€™ll have to select that photo again to log in.

The failure of shared â€œsecretâ€ information is that it is rarely secret, and more important, the more this â€œsecretâ€ information is used, the less secure it becomes.

For example, when it comes to self-supplied secrets, the most common questions are easily found on the web or in publicly accessible databases: birthdate, motherâ€™s maiden name, pet name, etc. How did Paris Hiltonâ€™s Sidekick get â€œhackedâ€? Someone figured out that she used her dogâ€™s name. More important, is the fact that most websites and services ask the same information: birthdate, street where you grew up, motherâ€™s maiden name, favorite pet â€“ which means that your information is more and more out in the open. Historical data is also challenging â€“ when I went online to request a copy of my credit report, it took me five minutes to figure out if I ever had a student loan from a specific bank, as the lender has changed multiple times based on consolidations and one bank selling the loan to another bank.

However the largest challenge with shared â€œsecretâ€ information is that this information is very much only applicable for the web. Securing a phone transaction with a picture is ineffective, and being able to speak freeform text to answer a historical or shared secret question isnâ€™t technologically feasible. The only option would be to present a multiple choice for the user to answer, but best practices and common sense rule out any security method where a potential answer is given at the time of the challenge.

This is why 2007 becomes the year of the CAT.

The mandate for banks to implement multi-factor authentication for the web left the field wide open for vendors to propose â€œcreativeâ€ solutions to achieve FFIEC compliance. However, once voice is thrown into the mix, the list drops dramatically. With voice, there are two available methods for authenticating: touchtone and voice. This leaves two methods for strong authentication: speaker verification and one-time pins.

Now, though I am the CTO of a voice biometrics firm, from a functionality perspective â€“ both solutions solve the problem. Asking a user to input a one-time numeric PIN generated by a hardware token or to leave a voiceprint to gain entry both satisfy the requirements for multi-factor authentication.

More importantly, both solutions can be easily implemented for web and voice, assuming that the bank has a strategy for implementing a well thought out CAT infrastructure.

Implementing one-time numeric PINs for the voice follows the web in a CAT environment. In this case, the voice application would ask the user for the OTP, pass it over to the appropriate authentication system and get a response back regarding the user passing or failing the authentication request. Since the OTP system is already integrated to a web process (typically a web service/SOAP call), the voice application can make the same call (simplified with the use of a VoiceXML 2.1 request) and parse the same response to gain access.

Implementing voice biometrics for a web process, however is more challenging, but still easily handled. The typical process, as shown by vendors ranging from Authentify to VxV Solutions (my company) show a process where a web user, after starting the login process, is instructed to call a phone number and authenticate his or her voice, either receiving back a one-time pin (also called a Soft-OTP) or being redirected back to the web application after passing the biometric claim.

In both cases, the key is that the bank can now standardize their processes for handling both web and voice transactions. However, standardizing the processes doesnâ€™t necessarily mean standardizing the method. It is expected that banks, and enterprises in general, will support multiple authentication methods based on the userâ€™s needs and status. For example, shipping an OTP token with the bankâ€™s name engraved on the back may cost upwards of $30 per user, but for key clients, the cost may be mitigated by the fact that it is a very fast way to log in. Conversely, a voice biometric solution is typically much cheaper, though less convenient for web users as it requires the user to make a phone call to enable their web session.

What is expected is the growth of a new range of multi-factor brokerage services, such as Ping Identityâ€™s PingLogin solution: designed to let a user select the preferred method of providing multiple factors. In this case, consider a preferred bank customer. He (or she) may have an OTP token provided by the bank and a fingerprint scanner at home. The bank may have also enrolled a voiceprint. When the user logs into the website from work, he could use a voiceprint or OTP â€“ when calling in, he could use the same voiceprint or OTP, but when logging in from home, all three methods could be used. Fingerprint, in this case, would most likely be the fastest and least obtrusive.

Instead of integrating each of these solutions into the voice and web applications, and requiring separate dedicated logic, the authentication broker would simply determine which methods are available, which can be used based on the mode, and then allow the user to select the method of his or her choice.

Again, the benefit of this type of broker is now exponentially increased based on the implementation of CAT. Common SOAP interfaces and easy integration into voice and web applications allows for this choice of flexible multi-factor authentication.

If 2007 is the year that CAT turns the corner, or crosses the chasm, or whatever weâ€™re calling it these days â€“ Iâ€™m looking towards 2008 to be the year of federated security. You canâ€™t have all of these banks making investment in strong, multi-factor authentication without someone finding a way where they can monetize the implementations â€“ and leveraging these internal identity databases and authentication methods lends towards these FFIEC compliant banks looking towards becoming independent, trusted Identity Providers (IdP). The currently blog-centric OpenID movement shows the beginnings of a decentralized security model where a user could use an identity at their bank to get into their healthcare account, or into their cable system to get their latest bill. Adding trusted Identity Providers helps move the focus of OpenID from blogs to transactional accounts such as banking and finance.

The Killer Smartphone App

3 Replies

The rumor mill is burning the midnight oil regarding the eternally impending release of Apple’s Smartphone offering – the iPhone (or iChat AV Mobile, depending on which site you consider canon for scuttle such as this). This possibly revolutionary device is thought to be the ultimate merger of media, data and telephony – offering full iPod music playback, synchronization with iCal, Address Book, Mail and .mac. Either it’ll support Cingular, or T-Mobile, or all of the mobile carriers and support 3G data networks… again, depending on the rumor source you trust.

But in response to all of the chatter about the media and synchronization capabilities, there is a true killer app that, if implemented, would make the iPhone the hands-down best smartphone in existence.

What I’m looking for (and most of the smartphone community in general) is a smartphone, with the emphasis on phone. As in – I primarily want to be able to make and receive calls.

11 months ago, I picked up an o2 Mini S on eBay. In the US, it’s called the Cingular 8125, T-Mobile MDA or i-Mate k-jam. This device was touted by the telephone rags to be the best fusion of PDA and phone: a small enough factor that it’s comfortable to carry, a full day’s worth of charge, slide-out QWERTY keyboard and touchscreen. For the first few months, I was generally happy with the device. I could synch it to my office PC and home Mac, it the bluetooth worked fine, and the text messaging capabilities were pretty darned solid. The music and video playback was smooth and the camera, for a multifunction device was better than average.

However, what I started to quickly notice is that the phone services, well, they weren’t just sub-par, they sucked.

The problem isn’t just endemic to this one device, but to Windows Mobile devices in general. The problem is that the telephony functions are constantly fighting for CPU cycles with the general PDA functions. The result can manifest itself in many ways – not being able to get resources to make the phone ring, or not having the resources available to accept me pressing the “answer” button to receive an incoming call. This week alone, I’ve missed three calls simply because the phone didn’t respond when I pressed answer, and it isn’t just me. This problem also impacts most of the touchscreen Windows Mobile 5 devices.

The biggest problem for me, however, is the voice stability. As my work is primarily in speech technologies, I can easily tell how well a telephone device encodes an audio stream. Now, the HTC mobile devices (the manufacturer for most of the non-Samsung/Motorola Windows Mobile devices) have already crippled the phones by implementing a very poor microphone – a pinhole style that, even under the best conditions (as tested with a simple audio recording application) creates poor quality recordings. This is further complicated because of a core Windows Mobile issue.

It seems that Windows Mobile has made the audio encoder (codec) just another software process that has to fight with the other applications for CPU cycles. When the CPU is occupied with other tasks, the encoding is crippled to the point that simple speech recognition can be dramatically impacted. Again, this was simple enough to test by calling into one of my systems without any other apps running, then with email, then with email and a video. Overclocking the CPU helps, but not consistently and not enough.

Though this problem impacts my WM5 device, it impacts every smartphone out there. Count the number of Blackberry users who also carry standard cell phones because the phone experience on these devices aren’t very good. Treo users seem to be in the best boat, but it’s still less than great.

It seems that if you want a successful smartphone, it needs to be just that – a smart phone. Apple has the marketing and mindshare, but can they actually create the compelling device?

Karmic Justice

Occasional thoughts on voice technologies, identity, strong authentication and those delicious drinks called cocktails from Avery Glasser

Category Archives: Enhanced Voice

Surveys and Mediocracy

Strong Auth Drives Conversational Access

The Killer Smartphone App