In April 2007, UNICEF´s Director of the Division of Communication, Dr. Sharad Sapra, created a small team to explore using mobile devices, PDAs, and low-cost personal computers in communication work. He posed the challenge: "How do we connect children who have no access to computers or the internet to children who do? How do we allow for some of the same faculties and functions that young people in highly connected environments are used to, when all that is available is a mobile phone or a radio?"
After initial research our team decided that mobile phones equipped with only SMS and voice were best suited for UNICEF's development work. UNICEF is on the ground in over 150 countries to help children survive and thrive, from early childhood through adolescence. Much of UNICEF's work is with vulnerable populations who have limited access to most ICT. However, basic mobile phone technology has reached many of the populations we are working with and mobile phone penetration is predicted to reach 90% of the world´s population by 2010. We believe that technologies blending powerful web-applications with simple mobile phone interfaces could reach the broadest possible audience.
In the last year our team has been exploring the integration of mobile devices and web applications for large international development organizations. The global scope of UNICEF´s activities create massive logistical challenges as well as constant internationalization and localization issues. It quickly became apparent that blending telecommunication technologies with web applications could provide a set of powerful tools for UNICEF in many aspects of our work beyond youth communication. Our explorations in the last year have shown two primary areas of application development: providing administrative tools and tools for what we call "Communication for Behavioral Change".
Due to immediate interest from our Emergency Operation Center, our initial prototypes were tailored for providing administrative support for emergency response. Our responses require coordination between our headquarters in New York and Geneva, supply division in Copenhagen and emergency response teams on the ground.
We also developed use cases for monitoring and evaluating the efficacy of programs with respect to the Millennium Development goals. Real-time information could be collected in the field from anyone with a mobile phone and information could be quickly compared between countries and regions. If SMS messages could report data back to field offices, custom data-entry hardware would not be needed by all field workers.
After demonstrating the potential of multimodal systems, interest was piqued for using similar systems for "Communication for Behavioral Change." We investigated use cases for teaching literacy using text messaging, behavioral change programs for the prevention of HIV and AIDS in Africa, and anonymous rape counseling. Text messaging provided a unique, immediate way of communicating with youth, and bringing us closer to bridging the digital divide for youth communication and realizing our mandate from Dr. Sapra.
Interest in our projects, along with rapidly increasing interest of NGOs worldwide, has energized UNICEF's efforts for promoting novel uses of mobile phones in the developing world. With several implementations in place and requests from more UNICEF country offices, we seek to greatly expand our involvement in this emerging field over the next few years. By demonstrating successful executions of these approaches, we believe that UNICEF, as a leading organization in the UN system, can provide tools and inspiration for other international organizations, local NGOs, and national governments to promote these technologies and their subsequent benefits.
Our team has encountered a wide range of challenges while working on the systems and use cases introduced above. We have outlined many of them in the sections below, breaking them into Environmental Challenges and Technical Challenges. For the purposes of this paper we will briefly outline some of the environmental challenges and outline a much broader overview of the technical challenges.
The most significant challenges of deploying our prototypes are the differences in infrastructure among regions and countries. The regulatory structures, variety of telecommunication providers, pricing scales, and service penetration vary in each country. When SMS costs are considered with respect to average income in the same area, even "cheap" SMS fees become problematic for most people, especially when multiple messages are needed to perform a specific function.
Local variations in infrastructure and cultural norms are manifested in very different usage patterns of ICT from community to community. Further variation in literacy and exposure to technology necessitates radically different approaches to create successful programs. Custom requirements raise expenses and impede development of a single, configurable product that might scale for broad deployment.
In order to provide the most robust experience, we have investigated integration of SMS, IVR (interactive voice prompts), TTS (text-to-speech), and audio messages with our applications. The unique constraints and advantages of each of these tools necessitate amultimodal combination of technologies.
Throughout the development of an SMS-based data-entry application, we found that several assumptions of browser-centric software development cannot translate easily to SMS-based interactions. The ideas of navigation and browsing are very difficult within the "call-and-response" dialogue of sending and receiving SMS, where neither hierarchies nor search results can be effectively conveyed. Users don´t have the ability to scan a long set of results or see a full list of navigation options because of the limited number of characters each message can contain. Designing for the constraints of message length must also consider internationalization of the application, as these limits vary among character sets. For example, an SMS using Roman characters is capped at either 140 or 160 characters (depending on encoding) whereas messages using Chinese, Korean, Japanese, or Slavic characters are limited to 70 characters.
The standard process of following hyperlinks (and relying on the "Back" button) is not appropriate for SMS. The high volume of messages required for discovery and browsing can be prohibitively expensive for users. Additionally, since wireless networks cannot be relied upon for timely delivery of messages, browsing a hierarchy is not practical. Rather than browsing and discovery, SMS interactions are best suited for specific, known actions more akin to a computer terminal´s command line interface than that of the world wide web. Much like a computer terminal´s command line, one must learn the proper codewords or commands to send, along with an accompanying syntax or grammar that is easily understood by the receiving application. Also, unlike a computer command-line, which provides instant feedback, typos or misspelled commands are a major usability issue with SMS. Employing an active-command interaction paradigm also assumes that users have a conceptual literacy of the software in addition to being literate in a traditional sense, neither of which can be expected in many populations we are trying to connect.
The process of maintaining application state and user identity across interactions, which web development takes for granted with login credentials and browser cookies, is another challenge of SMS interaction design. Web applications are usually not explicitly concerned with maintaining user states, as requests are simply carried out with respect to the identity and state information provided by the browser´s cookie once a user logs in to the application. Requiring a user to log in via SMS would increase the cost and barriers of use exponentially, necessitating the expense of several message exchanges and potential delays from slow message delivery. However, without pairing application identity (username, data, etc.) to user interactions, persons sharing the same phone number are unable to maintain a unique identity within the application. Further, SMS communication is a "best effort" system (without guarantee of message delivery), so the assumption of a sequential, asynchronous interaction is not reliable. Wireless carriers vary in their implementation of "effort," and reliable delivery becomes more problematic if the sender and recipient are on different networks.
The constraints of audio-based interactions present distinct challenges for interpreting assumptions of browser-centric application design to other modalities. For our purposes, audio-based interactions may include IVR, TTS, and voice recordings. Unfortunately, there is a lack of free, open source, and high-quality TTS packages which dramatically limits the usability and scope of projects that rely on TTS technologies. This problem includes both internationalization (lack of TTS for a language) and localization (lack of TTS for a language's accent considerations). Without high quality TTS for every language and local accent, most populations remain unreachable by this type of voice communication.
Unlike SMS interactions, IVR menus allow for hierarchical navigation, yet audio navigation is not without limitations that constrain the implementation of browser-centric information designs. First, each stratus in a navigation hierarchy cannot contain more than ten items. Next, breadcrumbs and other typical hierarchical context cues rely on visual feedback. Without being aware of current location within the information architecture, users cannot effectively navigate a complex hierarchy. Additionally, cognitive constraints of short-term memory must be considered, as we typically can only keep track of 7 ± 2 numbers at a time.
With respect to developing our data-entry application, we explored an IVR implementation. While entry of numeric or Boolean (yes/no) information is possible, textual data entry seems impossible to implement. Without visual feedback SMS composition, keypad entry is not ideal. Audio recordings of textual data presents the additional problem of transcribing speech.
The unique constraints and challenges of SMS and voice interactions necessitate the development of multimodal applications. Rather than designing applications that are modality-agnostic (accessed with SMS or voice or browser), robust interactions with complex applications require a synthesis of modalities (accessed with SMS and voice and browser), where some functionality may be specific to a modality. Instead of thinking of multimodal applications as modality-independent, we must approach these applications as reciprocal interactions that span modalities.
Regarding our data-collection application, having avenues for voice and SMS data entry allow for more types of data to be submitted. The web interface of the application is also important, but for addressing different functionality. The browser experience provides data review and graphing, which are features that field workers may not require as regularly as data transmission. In this situation, the different modalities for interaction reflect different levels of use and functionality. Similarly, an emergency communication tool can provide a web interface for 'mission control' coordinators and SMS or voice interactions for those on the ground. In these cases, the greatest value comes from several levels of functionality with many modalities rather than a single function with any device.
While a multimodal approach allows for much more complex applications to be developed, many of the constraints persist in these systems. Additionally, as complexity increases usability issues and training needs increase. Standards for interaction between modalities could allow for more powerful applications and robust solutions for the real world limitations found in developing countries.