Saturday, April 23, 2005

the not quite human other

Turing mentioned that that the suitability of the Imitation Game as a substitute for the question “Can machines think?” was debatable, but he never really returned to that issue. Do you think it is a good substitute? Why or why not? Why do you think Turing proposed this substitute?

well, i think turing rephrased the question into something more concrete and measurable because the word 'think' conjures up plenty of humanistic connotations like emotion, conscience, and a soul. i cringe at 'dictionary look-ups' but for the sake of argument, we'll whip out dictionary.com's view of think, v. intr.: to exercise the power of reason, as by conceiving ideas, drawing inferences, and using judgment. how can a computer conceive ideas? is reason or judgment a sort of inherent, generative, cognitive ability or has decision-making algorithm merely been programmed? therefore, instead of introducing a whole philosophical can of worms, turing devised a test in which one could measure the appearance or resemblance to being human.

it is interesting that only recently was the turing test actually performed as explicitly stated (with the test being that the computer could just as easily fool an observer that it was the woman pitted against a real woman versus a real man posing as a real woman.) i think that for his argument, the substitute works because successfully appearing to be able to think like a human is basically saying that the machine can resolve judgment decisions and effectively display them like a person can convey external communicative signals for their internal thoughts. even though we interact with our human brethren constantly, we don't know for sure what's going on inside their brains except for what they choose to disclose. maybe we're all robots and we've all been programmed to learn and display intelligence to others accordingly. a rich range of emotion and personality requires an extremely well-programmed machine, and that is what the turing test tries to prove: that the machine is as well-programmed as a human being.

We can frame these questions in terms of signaling: "thinking" is the quality we wish to determine about the other, but it is invisible. We must instead rely on observable signals as indicators of this quality. Turing is proposing successful playing of the Imitation Game as the signal - is this a reliable signal of intelligence? What makes it reliable (or not)?

imagine if a computer, compared against a human competitor, could as easily fool observers as to thinking it was the real man. it may use signals such as language usage, knowledge, timing, and personality. however, the successful test merely indicates that the computer could convince someone that it's a man, rather than it is a man itself. you could see this as two people trying to convince you simultaneously that one is telling the truth and the other is lying, or vice versa. using only text-based communication, which one is more educated? richer? more skillful? the signals for something like that would be one similar to IM of the current age... most anything could be made up, but indicators like language fluency, sustained knowledge on a given topic, and clever literary twists would be more reliable since the costs are higher to signal dishonestly.

is consistently sending high-cost signals a reliable indicator of intelligence in this case for the computer? i think so, if we define 'intelligence' to mean 'the capacity to acquire and apply knowledge.' only a computer that has been programmed so effectively as to pass off as a man incapsulates the costly time and effort of the human engineers and designers as well as the costs of physical memory and execution efficiency. an entire lifetime's worth of experience and a full-resolution personality which flavors the computer's communication can be costly to both acquire and integrate.

i think the key lies in developing a thoroughly comprehensive test that truly challenges the computer yet is relatively easy for a human. right now the scrambled-image reading text test that graces many websites now to avoid spambots or spiders to click automatically through registration forms or confirmations attempts to foil the poor computer-vision-reading ability of computers that comes naturally to humans. that may not be a valid test in the future if computer visual comprehension improves. in any case, the turing test (or intelligence test) must truly stretch the computer, and not the human, to its cognitive limits as its challenge.

one possibility is testing theory of mind, which wikipedia defines as the ability to understand that others have beliefs, desires and intentions that are different from one's own. the capacity for understanding others and a collective consciousness form the basis of human social interaction. an example would be the sally-anne test, illustrated here.

Weizenbaum created ELIZA in part to show that simple communication was not a reliable signal of thought. He modeled it on a Rogerian psychologist: how did this framework help people communicate with the program? How did it affect their perception of its underlying intelligence? As you look at the various contemporary chatbots, think about and describe how the model of what type of being they are affects one's interpretation of their inner state.

unfortunately, weizenbaum's ELIZA was received completely on the opposite end of what he intended to show. instead of illustrating that 'look, this computer that appears somewhat to be a competent conversationalist is only a program with certain defined simple rules which i should expose here', people were amazed with ELIZA and viewed her dialogue and response as if a real person were talking. since ELIZA's rules incorporated many of the social rules that convey understanding and conversation, like listening, referencing what the other person said, and occasionally inputting a relevant bit of personal knowledge, her responses seemed understandably realistic.

it's amusing but also concerning that people suggested ELIZA for psychotherapy, as an always-accessible friend, or for other socially psychological uses. the fact that people could intellectually know the other side was a computer still took a backseat to the actual conversation which could talk to you ever-so-convincingly. as long as ELIZA spoke and responded coherently and sensically, she would be viewed as an intelligent conversation counterpart. the balanced, seamless dance of back-and-forth dialogue would have to be maintained to continue the illusion.

after testing out several current bots (some which were more reliable/predictable/useful/realistic than others), i have to confess that although at times they are impressive, only a few slips in language interpretation or repetitious versions of 'what? i dont understand' revert you back to the knowledge that you're talking to a computer program.

i chatted with eliza for a bit (the 'doctor' on emacs) and soon realized that she was very much a psychotherapist in her ways of speaking, as in always reflecting the questions back to me, making sure i was pleased or clear regarding how i felt, and occasional references to my family and childhood. after getting a sense of what she was like, i could spew personal anecdotes, hidden pet peeves, and detailed narratives without fear that ELIZA on the other end would be bored, offended, or judgmental. however, the self-spewage basically becomes a monologue, and then it's really much like talking to a mirror. which is fine, if that's what someone needs... but ELIZA herself has zero to none personality (except the rhetorical side) and so therefore she can be intepreted as the listener of infinite patience.

i tried ellaZ, supposedly the winner of the 'most-humanlike-bot' of the 2002 loebner prize, but i couldnt talk more than a few minutes with her. she kept reverting back to her core topics of knowledge (i didnt know anyone could be so obsessed with slot-cars), and when i asked her 'what is your job?', she displayed the page from wikipedia for 'job'. how weird is that? but then i realized she was really good as a humanoid sort of search engine, and then went with that. i guess i saw her as a human in a way as i tried to be patient with her peculiarities in response, and then i became confused as she kept shifting the conversation away from the immediate sensical topic. only after discovering her strengths could the dialogue make more sense.

then i spoke with ZolaOnAOL, a chat IM bot. she wasnt bad as a bot, since her programmers infused her with a little bit of static personality, such as opera being her preferred kind of music (la traviata her favorite), 'lost in space' being her favorite movie, loving tv shows about robots, and being a star trek fanatic. however, zola only carries a single memory state (there's no gathered memory knowledge, except knowing if i am speaking to her for the first time or a returning chatter). for example,

liuliuliu1: do you drink?
ZolaOnAOL: I have been programmed with a list of 15,000 known beverages.
liuliuliu1: which beverages?
ZolaOnAOL: The small one, I think.

quite observedly, the response doesn't really match up, and she tends to have the memory of a goldfish. also, things in which she apparently has knowledge don't really reflect externally with surface-detail inquiry:

liuliuliu1: do you drink water?
ZolaOnAOL: I drink water in moderation. My favorite drink is mango juice.
liuliuliu1: what is mango juice?
ZolaOnAOL: I know a little bit about mango juice, but I could always learn more. I'll have to study mango juice. I'm interested in anything that isn't harmful to people.

however, she's really helpful in info-based things like weather, movies, and synonyms, which is really easy to code up if people interact with zola in the prescribed way with the correct parameters. zola may have wittier comebacks than eliza, but she's a terrible conversationalist on everyday things, and therefore her model resembles more of a gopher or avatar'd looker-upper and likes completing tasks rather than casually chat about this or that.

In Being Real I discuss briefly the possibility of agents that use voice, video, etc. to communicate. How would such extended communication channels affect the reliability of the signal as an indicator of intelligence? If you are interested in exploring this question more deeply, a good starting point is Steven Harnad's paper "Other Bodies, Other Minds: A Machine Incarnation of an Old Philosophical Problem".

well, some of the bots i played with (ALICE, ellaZ) used face avatars to represent the bot, sometimes speaking with voice or at least trying to feign expression. ALICE was a vector illustration, whereas ellaZ used static images of a photographed human model. they provided nothing much more than putting human-like visual packaging atop of quasi-humanlike text responses.

if the conversational signals displayed by the bot are realistic and natural, extended communication channels such as voice and video wouldn't provide much additional reliability for intelligence. if you read a novel, its character seems to be just as alive (or even more so) than if you watch the movie and view the character represented by a flesh-and-blood actor (or visualized through computer graphics, whatever the case may be). a well-developed character illustrated within a story, for instance, uses dialogue passages and response to environment as signals that this entity could be a real person; such thoughts on our part are natural as we find ourselves experiencing empathy or familiarity with a fictional character.

the voice can externally express type of person (sex, hoarseness, volume) and also spoken emotional inflection, but these sorts of signals can be easily tweaked (through filters or impeccably acting skills). using video to show the face, mediated or otherwise, can also become an unreliable signal as on-the-fly editing, puppetry, and mediated falsification can affect perception. plus, these sorts of signals do not necessarily change the content of communication (i.e. the actual dialogue), but frames them within a personally categorized context. i do not believe that this framing affects whether or not the bot is viewed as 'intelligent' or not.

Sunday, April 17, 2005

design for signaling

The poker paper directly addresses the issue of how changing the interface changes the relationship between signal and quality. Write a paragraph or two discussing these issues: In this domain (playing poker), what are the qualities that the players want to know about each other? What do they want to reveal? To hide? What are the cues and signals (in face to face poker, avatar online poker, our online poker) that indicate these qualities? Are they reliable? Why?

in poker, the game relies heavily on players' strategies to maximize changes of winning the round's earnings with minimizing any possible loss. a player's external moves, such as confident betting or a neutral face, may be carefully calculated to bluff a weak hand or to intimidate others to fold. this naturally leads to skillful readings of the opponents' actions and expressions to make the best move within context.

delving into the deep art of poker psychology, some qualities that each player wishes to know about the others are how (truly) good or bad their dealt hand is; if a player's expression is truthful (which reveals the quality of the hand) or bluffing (which serves as a deception mechanism); and the long-term reputation of poker-playing style. if a player could gain knowledge on how relatively bad or good the other hands were, they could fold within a window of minimum loss or bet with greater confidence: increased knowledge reduces the palpable risk of play. an experienced player might be able to tell when an opponent leaks an imperceptible emotion from a poker face with gaze, breath, or a number of other nuanced clues. therefore, a trained poker face becomes essential for effective bluffing. information on player reputation can be invaluable, especially if one can anticipate or predict another's behavior and strategize accordingly.

therefore, poker players want to reveal whatever they feel is the most strategic signal to send. mostly likely they wish to bluff their hand and keep betting to intimidate opponents. confidence or neutrality might be the ideal, so that nothing telltale can escape their person calling attention to their bluff. as for hiding, they want to keep the quality of their hand top-secret so that the opponents cannot truly tell what they are competing against. for either any dealt hand, concealment of one's handheld fate is paramount.

in face-to-face poker, the subtlety and richness of human interaction are present. several barely imperceptible signals of variable reliability intermingle as players all face each other around the game table. as repeatedly stressed, the poker face becomes the ideal as no one wants to reveal anything that might shed light on their plight. therefore, the skillful poker face, if worn consistently, carries no quality (except that the player is either experienced or has great control over his expression, which is probably altogether advantageous) and conceals all. however, if a player isn't completely successful in his poker face, or allows micro-expressions of varying emotions to appear, the signal may be more reliable. if for a split-second, someone's eyes fell depressingly, that might be a reliable signal that he has a less-than-ideal hand. however, i don't know exactly the gameplan of skilled poker players, but calculating an intentional 'micro-expression' that's meant to throw opponents off may be a possible strategy. in that case, flickers of expression from beneath a poker face may indeed be faked, and therefore unreliable. conversation among players can also serve as a signal; quavers in a voice might point toward confidence or bluffing, or speed, tone, and dialogue might also be utilized in aggressive or mystifying ways. depending on the interpretation of one's speech, the signals can either be involuntary (highly reliable) or schemed (unreliable). in play, the most tangible signal is the betting amount. a higher ante signals a valuable hand which, depending on the player's reputation (conservative versus aggressive), may vary in its reliability. for example, a high bet from a player known to be conservative remains a more reliable signal of his winning hand.

avatar and text-only online poker share most traits, so i'll discuss the common features together. since live, breathing humans aren't readily sensed through the online medium, typed chat dialogue among the players serve as a major signalling source during play. what, how much, and how often a player types can signify one's experience as a player, concentration on the game, and intent on appeasing or misleading the opponents. if one chats with high frequency, he can be viewed as distracting or attentive; if one chats with low frequency, he can be viewed as aloof or concentrating. a silent player might be interpreted as an online-style poker face, or someone who's a poor sportsman. online, players' handles or usernames can be very useful if used consistently + continuously throughout gameplay and linked to player histories. hence, a username becomes tied to one's reputation as a poker player, and this profile may be used by opponents for proper strategizing. however, the utility and reliability of player reputations depend on the game system's rewarding long-term play accounts and recording accurate, informative histories. and lastly, public bet amounts through each round are signals indicating how much money each player would be willing to risk. since the online realm conceals many of the subtle and valuable signals that face-to-face interaction provides, a higher bet can be a signal of higher reliability that someone has a strong hand: the player risks more despite reading fewer bits of crucial knowledge. however, one might argue that higher bets online do not necessarily translate into an intelligently calculated risk because transactions of virtual, invisible money can be perceived to be less 'real' than a pile of tangible poker chips. additionally, with the knowledge that a higher bet can be seen as a more reliable signal of a good hand online, a player can bluff more aggressively.

one signal that avatar systems can provide (over merely text-based ones) are the choices a player makes to sculpt his or her personal avatar. as mentioned in the paper, visual stereotypes abound. despite the intellectual logic that an avatar is merely a representation and not an actual person, people draw multiple conscious and unconscious associations with a particular image. if players can establish their own avatar appearance while creating an account (a la the SIMS), their handle and avatar representation are intrinsically linked and may be useful references while in repeated interactions with opponents. however, this visual reference can only be reliable if an avatar's appearance remains consistently stable throughout an account's history. as far as reliability goes, an avatar can only signal what the actual player intends for the competition's eyes; everything is controlled, calculated.

Evaluate the other technologies (chat circles, comic chat, fuzzmail, comTouch, and two of your own choosing). Think about them in comparison to face to face communication. What can be seen/heard/felt of the sender - i.e. what are the sensory constraints on signaling? How does this affect the reliability of the message? Is there a particular type of message that the medium is especially well (or badly) suited for sending? How ambiguous are the signals - do you expect the sender and receiver to mean the same thing? Are there particular costs associated with the medium? Are they simply added costs or do they contribute to reliability? What modifications would you want to make to these interfaces to make them more or less reliable?

chat circles transform the purely textual environment of chat into a more spatial and visual environment. when a user types, their identifying circle (of an individually chosen color) swells with activity and those in hearing-range proximity see the text appear on the screen. content of conversation only appears within a window of spatial distance; to 'listen' in to dialogue, one must drag one's circle near an appropriate cluster of chatters. however, circles swell across the map are universally shown as swelling with activity and ebbing with idleness. upon a visual glance, only the signals of activity are apparent; the next level of movement and proximity offers more substantial content. this is similar to the situation of walking into a crowded room and spotting certain clumps of people, gravitating toward a fascinatingly animated conversation or moving to a peacefully secluded corner. heard conversation depends on speech within earshot.

chat circles improves upon normal IM in the sense that a speaker has a better idea of who's within listening distance and who's actually listening (in terms of activity). in a regular chat room, one's input is broadcast over the entire space and the reception of the text remains ambiguous before any direct reaction or reply. with a visual display of 'who's listening,' a user in chat circles can tailor messages to a specific set of listeners. in face-to-face analogy, it's like speaking candidly and intimately in a small group rather than everyone in the same room having a megaphone at all times.

i envision chat circles can be particularly good for spaces in which the users are familiar with each other,; therefore, observing a particular cluster of known characters can spark intrigue with the relative heterogeneity or homogeneity of the mix. if i enter a room in which i'm unfamiliar with most of the users, i may not know how to prioritize or judge which conversation to approach; i'd be wasting time moving my circle from cluster to cluster in search for something to pique my interest. chat circles may be bad for those who submit long narratives (since the interface only supports short snippets), a very dense room, or a forum in which everyone's words are significant. in a crowded room, it might be difficult to move about or isolate a particular hearing range, causing information overload or immobility. in a forum, you would like to have everyone have a microphone at their disposal to broadcast what they have to say.

the largest cost in chat circles for me is the proactive controlling of movement of one's circle across the map. it may be socially awkward for me to leave a cluster, since it's visibly clear that i want to get away to join another group. also, moving about to test out each cluster's conversation becomes tedious when many clusters exist but are out of hearing range. it would be useful to have a phrase or two leak into hearing range from external conversations that would catch my attention or draw me to the conversation. these teasers would be an additional signal to swelling circles and number of speakers. the cost of not being able to say something that's heard by everybody in the space increases the reliability of signals with an identifiable clustered audience. the inability to transfer relatively personal information in a dense, noisy space levies a cost, but this enables a private conversation to move to a more appropriate venue: another less dense room or through individual chat windows.

* * *

comic chat provides a visual, comicbook-styled transcription of a chat conversation. gestures, expressions, and layout are automatically designed through a variety of constraints and triggers. the text content itself doesnt seem to be altered, except for formatting changes in line breaks, comic font, and translation into all capital letters. it wasnt clear how or why particular drawn characters were assigned to certain users, and the significance of background scene, but each user assumes a consistent character throughout the chat and it should be clear through normal comicbook rules what follows during sequence of dialogue.

there may be some elements of unreliability, as in the program could place a certain expression to match a certain phrase, yet because of reasons of ambiguity, sarcasm, or differing context, the system could design a mixmatch. however, the paper iterates that the user can override any system settings and can even customize the view options on their client. the drawn characters serve as avatars, in which they represent, but do not necessarily resemble, the typists behind them. the visual cues, as well as the scenic settings and the pan/camera views, vary in reliability depending on the accuracy of the comic strip design system, matching them with the inputted text. there are other difficulties with the program, with a maximum number of characters within one frame, and the visual appearance and disappearance of characters depending on speaking frequency. if we don't see frank for several frames, does that mean he's quiet, unattentive, exited and re-entered, or merely next in the long queue of simultaneous remarks?

comic chat seems to be best suited for chats with few people, probably two to four, to avoid the musical-chair-like rotation of characters in view, to avoid confusion of character identities, and to minimize disruption of visual flow. casual, familiar dialogues would be appropriate for the imaginative, line-art graphic style. something like a work-related or task-specific chat might not be terribly good for comic chat, or forums with lots of people to distinguish and display. any sort of avatar will cull others' assumptions and reactions, and length, jumbled, multi-character comic narratives would be difficult to read and to interpret.

perhaps after regular usage, comic chat may be an acceptable visualization of the dialogue, but there might be occurrences in which the character's pose does not accurately reflect the user's meaning and can cast an unintentional signal. the costs to better the situation are for a user to review, edit, and monitor the actions of his or her own character and control them accordingly to their own means. this takes time and effort, and may hamper the rapid pace of text-only messaging. instead of avatars, the characters might be identified by mediated faces or actual photographs; however, anonymity of users can be preserved and users can be more creative with their imaginative virtual image.

the reliability could be improved by ensuring that every user wears a distinct, consistent character throughout the chatspace over time (intrinsically linked to an established handle). this way, a visual cue + the username can form a single online identity. it might be interesting if users could write or buy their own 'expression plug-ins' or custom character gestures to augment their character to act uniquely; i.e. sam can twist her tongue while jill can roll her eyes.

* * *

i thoroughly enjoyed fuzzmail; interminably so, even. simple, yet poetic.

it merely introduces the dimensions of actions through time of the sender, which are completely absent from most received email. a normal email represents only the last flash-frozen moment of an extended composition process, a process in which fuzzmail attempts to encapsulate for the recipient's consumption. through fuzzmail, the receiver is able to recreate the experience of the sender's typing + thought process + restructuring + 1st (2nd, 3rd...) drafts + typographical artistry + pace and pauses. it's like a recorder, capturing everything from the moment the sender places her cursor into the blank text box to the final click on the send button. the entire signal is visual.

to be sure, with regular usage, fuzzmail could be utilized as a poetic, choreographed medium. the sender can plot the timing and type changes and do some practice runs before sending off the real thing. but there are some elements of higher reliability, such as identifying copy + paste inputs (which would suspiciously plaster a large amount of text instanteously) or typing speed (which is adjustable for the viewer, but there must be a calibrated 'real-time' speed setting on there). i can see increased reliability for the recipient to ascertain the identity of the sender if the sender had a distinguishable 'secret type' or hidden message. for example, i could establish an insider signal with my best friend, in which he'd type 'plethora' (his favorite word), backspace over it, and then continue with the message. you could also identify through familiar habits of written composition, e.g. writing a message from the bottom up, or abbreviating a lot and filling them in later. fuzzmail could capture this internal secret identification move; static email could not. you could hide a lot within the boundaries between beginning and end; you could stash an illicit love letter within an innocuous business message (secretive function), or encode something through the ratio of words written to words erased (secretive form).

fuzzmail is great for humanizing the message. elements such as typos, varied speed, and revisions reveal a lot of personality (or the brilliant choreography) of the sender. i imagine it'd be ideal for intimate, familiar relationships, both offline and online. it's very much a ghostly presence, simulating the effect of an invisible person pecking away on a keyboard. however, the more knowledge you have about the person, the richer and more meaningful the little quirks become. if you receive a fuzzmail from an unknown sender, the dance of letters and deletions may be entertaining but read on a more superficial level. fuzzmail would be terrible for formal messages (who wants to know that you type slowly or can't spell without an electronic dictionary?) or long, dry, typewriter-like missives. it's definitely more suited to casual, familiar correspondance.

of course, many of the quirks that may be innocuous to the sender might be miscontrued by the receiver. first thoughts that didn't escape under the safety of the backspace key in time might have been better off not in visual form at all. the composition process over time might confuse, offend, or even bore the receiver. the expended reading time for the receiver might be extended or excessive, depending on the verbosity or indeciveness of the sender. a perfectionist sender might need to write several drafts and perfect the typing so that the words flow and pace as intended; one fatal mistake would cause it to go back to square one. however, the costs are well worth it. although a fuzzmail may disclose a person's penchant for misspelling or wishywashy construction, it gives a more human and honest view of the sender and clearly traces out the thought process through time. and even though it might take more time to read a fuzzmail than a normal email, the organic qualities of the message enrich the experience. although it may be interesting to incorporate real-time sounds or images, i wouldn't want to spoil the simplicity. the raw straightwardness of fuzzmail leads to its success.

one improvement in reliability would be to verify the sender's email address. as the current interface stands, any address may be typed as the sender, and easily faked or accidentally misspelled. that would take an extra step or two, a couple more mailbox checks and clicks, but would help prevent false sender identities.

* * *

i'm more familiar with comTouch, and admire the addition of touch-to-vibration tactility to cellphone conversations. when constructed atop the phone foundation, the normal voice transfer of a telephone is augmented by user-controlled vibrations along the fingers. therefore, either party can sense both sound and haptic touch from the other person. if comTouch exists alone, or with minimal parallel audio stream (such as the spies-are-listening scenario), then a wordless language of vibration serves as communication. familiarity of a voice can be a reliable signal of user identification, and callerID is an everyday tool for this task. since comTouch is designed to be completely user-controlled (a vibration is felt only if the other party presses the button), there is a direct correlation between the actions. however, the button could be accidentally pressed, the phone could be dropped, someone could have a trigger finger, the mechanism could be broken, if the vibration frequency seemed excessive or rare, or someone's fingers are too weak to press. any of these could slip up an ideal, clear signalling set up. someone may send a succession of signals because they're excited, but the receiver may misconstrue the series of vibrations to be aggression, anxiety, or distress.

a multi-user party line of more than two people would be a terribly confusion venue for comTouch. it's sort of the inverse of playing clue with more than one other player... with a multitude of people in a conversation, since a received vibration carries no other sort of identifier, you can't tell who sent the vibration. on the sender's site, you can't easily specify who you want to send a vibration to. comTouch might also be non-ideal for non-acquaintence parties who have differing vibrational styles. the signals could be seen as too pushbuttony or too reticent, depending on the people. there isn't a direct mapping to comTouch in the real world, whereas with speech most people learn to communicate via speech with socially accepted volume, tone, and grammar, and arrange spatially to allow optimal personal and public space. however, comTouch seems great for familiar or intimate conversation partners (since the vibrations can stress hidden inferences or narrative tone in the same way that italics do) and who may need that extra element of physical tangibility within the communication. also, if parties are familiar with a wordless language through the vibrations (like morse code), you could carry on a silent, synchronous conversation without external knowledge.

the costs of comTouch include perceived too much / too little vibrations received on either end, unintentional signals (oops, didnt mean to press it), or missed signals (someone put down the phone and missed an incoming, urgent vibration). since there is no history or visual memory of the synchronous communication, attention to tangibility and to voice is required at a high level throughout the conversation. the cost of paying attention to four things--talking, listening, pushing, and feeling--increases the reliability since you're more confident that the other party will be able to interpret signals with integrity and without interruption.

something to try on comTouch would be to vary the vibrations, not only in intensity but temperature, speed, amplitude, noise, texture, density, and quantity of surface area. a larger 'vocabulary' of vibrations could enrich the tangible stream and reduce ambiguous or conflicting signals. right now most cellphones have vibrating modes (for 'silent' rings and so forth), but you can't enable the other person's phone to vibrate on demand. it would be interesting to see if subtle changes in whole phone vibration could signal environmental changes (a slow numb could indicate someone dipped below ground and therefore the signal weakens), extreme emotions ('i'm so mad at you!' could be accompanied with a fierce mechanical roar), or comfort (a tangible purr). for those with handsfree headsets, an earpiece version of comTouch might be an amusing and worthwhile exploration.

* * *

i enjoy handwritten correspondance for quite a number of reasons, because the richness of the medium carries and delivers multiple reliable and meaningful signals. i'll use as an example a personal letter on stationery paper, wrapped in an envelope, and delivered via postal mail. the letter transfers both visual and tangible mediums to the recipient, from the written text, drawn pictures, decoration of the stationery, and texture of the paper. often, the letter carries an olfactory signal (of the sender's personal scent, perfume, smoke, or pets) and occasional scars of the environmental like drink spills, accidental tears, or smudged ink. sometimes small objects are enclosed, like stickers, photographs, or clipped articles, and the letter itself can be adorned with cut and pasted images or a lipstick trace. the reading of actual pen-on-paper handwriting can be as rich an experience as the more dynamic fuzzmail, since the evocative flow of ink and spatial organization on the stationery reveal telltale qualities of the writer through their realtime recording of an individual's thoughts.

the more unique and handcrafted the letter is (doodles in the margins or an impulsive enclosure), the higher the reliability becomes. however, most would agree that a simple, personalized handwritten note (such as a thank-you or a get-well card) contains relatively high reliability purely because of the identifiable nature of handwriting + signature, plus the direct address. even though it may be easier or more efficient to automate thank-you cards to guests for wedding presents, short handwritten notes convey the high cost of sincerity and graciousness. since the handwritten medium encapsulates a high level of personality and time, efforts abound in simulating the effect within a low-cost framework. i.e. corporate form letters with a scanned 'signature' of the CEO at the bottom, or documents printed with calligraphy or script fonts. however, one can easily tell if something is genuinely and sincerely hand- written and delivered. handwritten correspondance may not be the best for someone who has terribly illegible penmanship, someone who is not physically able to write, and for efficient and nonpersonal messages (e.g. hey, anyone know the schedule of the 337 train for today? thanks.)

the recipient of a static letter may not be privy to the rich, storied, compiled history of said letter (like the fuzzmail's dynamics), but variations in the letter (different ink colors, styles, or spacings) add a separate dimension to the already highly personal medium. handwritten correspondance is incredibly costly because of the time and effort spent in the longhand composition as well as the time and effort in transit and delivery. however, the costs are well-worth the tangible evidence of human thought and craft. reliability can stem from the handwriting, the postmark, and personally intimate details. as for modifying a particularly longstanding and fundamental communication interface for reliability, you could bring back wax seals or stamps with ornate, unique, unreproducable designs, or authenticate yourself through personalized stamps (like http://photo.stamps.com/).

* * *

txt msg on cellphones are a quick, relatively unobtrusive, context-rich form of communication. short comments or inquiries are tapped furiously on numeric keypads, instantly sent, and viewed on a the phone's screen. the sender receives a message which is constrained by size (usually 100 to 200 characters), formatting (default size and font and color), character space, and from a sender who knows the exact phone number. the received signal reflects the actual message, a sent timestamp, and an ID displaying the sender's phone number or email address. the clumsy input interface and constrained size can impose garbled or ambiguous shorthand, and the message is at the mercy of the cellphone providers' service, but reliability is high for sender identification (thanks to callerID) and actual content.

txt msg is indispensible for instant silent communication; i can tap a message during a lecture or exchange comments without disrupting the room. seeing someone tapping buttons in a social situation doesnt seem to have the same knee-jerk reaction of repulsion as to someone yapping on their cellphone. for typographical and poetic reasons, txt msgs seem more appropriate than calling in times of sending a pick-me-up, a reminder, or something that's easier to send in visual rather than audio form. however, the shorthand nature of it can be disastrous for serious communication, where miscontrued signals can be fatal, or for long, detailed, media-rich missives.

abbreviations may be the most ambiguous signal; a receiver who gets a vry shrtnd msg or one with 'creative' spellings (due to the mistake-prone 1-2-3 tap input) might know this is just the familiar style of the sender, or may indicate a longer message that needed to squeeze in more letters, a person in a hurry, a person who is lazy and doesn't correct mistakes, or someone who is taking the situation way too casually. others who are used to it might never blink an eye. (amusing story at http://news.bbc.co.uk/cbbcnews/hi/world/newsid_2813000/2813955.stm, where a girl wrote an essay for school in txt spk). however, if both parties can understand and decipher the message, the signal should increase in reliability levels. txt msg develops its own particular language and cultural style.

the costs in txt msg include time and effort with input, reading and responding to messages with timely appropriateness, and maximizing efficiency without losing clarity in content. however, the easy identification of sender through callerID, the universal understanding of the txt msg size and input constraints, and simple crossover to richer mediums (with the phone which is already there) downplay the costs of ambiguous signals. you could make txt msg more reliable by integrating more formatting options (such like plaintext to html formatting), but that would counteract the spontenaeity and speed of delivery. an additional vibrational or tactile signal (similar to comTouch) accompanying the msg might be a helpful addition, since physical input can be rather intuitive, and it can set the tone of the message interpretation accordingly.

Sunday, April 10, 2005

the face

Fridlund (pg 109) says that "Signals do not evolve to provide information detrimental to the signaler. Displayers must not signal automatically but only when it is benficial to do so." Do you agree? How does this fit with the defniition of signaling we have been using thus far? How does this fit with involuntary expressions of inner state (such as blushing or crying)?

as interpreted as social signals, face expressions are seen as given off by the sender as intentional signifiers of particular qualities. for a signal to be reliable over time, there must be measurable benefits to truthful signalling and costs to untruthful signalling. deception should be costly and difficult to pull off. however, involuntary physiological measures such as muscle contractions or genetic formations that are intrinsically correlated with an expressive quality aren't signals due to their direct connection; they become more like cues or direct reactions. if the orbicularis oculi muscle contracts upon observation, and that muscle is proven to be a direct response of genuine happiness and nothing else, then an observed contraction of this muscle unequivocally indicates happiness. fridlund's statement asserts that facial expressions are changable and controllable in some way (consciously or subconsciously?), if they are formed or timed in ways that most benefit the sender with respect to the signal's reception to the receiver. in this view, expressions are never automatic or inevitable, since a 'true' emotion which would escape might be revealed at the cost of the sender. fridlund implies that costly signals of the face would never 'slip out' automatically or be revealed beyond the person's control, therefore ruining the signal's beneficial utility.

i tried to think of 'involuntary' expressions such as blushing, crying, laughing, and showing fear, and wondering to myself if these sorts of actions occur when alone, or with no particular intended audience, or if people have ability to stifle such expressions when necessary. maybe i'm strange, but there are occasions in which i've smiled or laughed out loud when reading a good book alone, which would question the signalling significance of that action. however, as argued by fridlund, i might be imagining that i'm around a friend who might enjoy the same passage, or simply be reassuring my own self, as another entity, of humor digestion. i would imagine blushing rarely occurs in solitude, since it most often embodies a shy or self-conscious reaction to speaking among or being in the view of others (unless the blush is a symptom of something emotion-neutral, such as rosacea or psychological disorders). i believe that it's been proven that people tend to laugh more around others, hence the canned laughter that populates sitcom television. as far as showing fear, there's always a stimulus (watching a gory movie, darting in front of an oncoming car, an embedded insecurity), which might serve as the 'receiver' to any signal emission. ticklishness is strange in that it incites people to laugh, no matter what emotion they may have been feeling at that moment, but perhaps a good, biologically-driven laugh primes the body and the mind to being open to happiness.

are these involuntary actions truly uncontrollable in circumstances where it would be costly to express? from personal experience, i've burst into tears during a particularly harsh music lesson, while waiting for someone that never showed up, and conducting formal tete-a-tetes. i've also had unfortunately social blunders in which i couldn't contain a swell of giggles that erupted in the middle of a sober, formal speech ceremony, in which i had the physically exit the room before i caused any serious damage (similar to elaine in seinfeld's 'pez dispenser' episode couldn't control her spontaneous laughter in a ongoing stage piano recital). i'm sure with much breathing and psychological inner voicing and lots of public speaking practice one could attempt to minimize blushing, but the signal still feels inherent (you may mimic smiles when happy because others tend to do so, but no one teaches you how or when to blush).

so, as to agreeing with fridlund that facial expressions are inherently controllable and revealed only at beneficial circumstances, or to disagreeing that some facial expressions are involuntarily exposed despite their cost to the sender, i'd lean toward the fridland argument. however, it's difficult (for me, at least) to ascertain whether or not revealing a facial expression is costly or beneficial. blushing may be embarassing, but it may be a beneficial signal that says, 'please forgive me, speaking to people engenders some difficulty on my part' which informs the audience of the circumstance. crying may be costly in that it shows weakness or frailty, but its benefit might be to coax sympathy or merciful surrender. laughing in the middle of a solemn ceremony might be inappropriate but merely infusing some liveliness into an artificial realm of sobriety. i don't know if there are expressions that are proven to be truly involuntary and more costly than they are beneficial. are there? if not, then this evidence further strengthens fridlund's stance on facial signalling evolution.

Ekman proposes that "all facial expressions of emotion are involuntary". Is there any way of reconciling this view with Fridlund's? Do they each use the worlds "emotion" and "expression" in the same way? In Ekman's view, are facial expressions signals? Are they in Fridlund's? Are they reliable signals?

if we view ekman's facial expressions of emotion as expressions that seem to portray a well-established sign of such emotion, these expressions must be impossible to replicate exactly without the genuine emotion behind it (such as duchenne smiles, which are impossible without actual happiness present), and that one does not have command over the revealing of one's true emotions. ekman calls this a 'leakage of felt emotion,' where micro expressions give away the appearance the inside feelings.

fridlund defines facial expressions as signals, primarily used for the benefit of the signaller in context of the situation and the audience. the expressions aren't present unless they provide some sort of net benefit; otherwise, there is no motiviation to display the expressions. as far as 'emotion' and 'expression' are defined, ekman sees emotion as an 'authentic self' or human feeling (from happiness to anger to enjoyment to disgust) and expression as 'micro expressions' and body language, where fridlund notes expressions as relatively independent to emotions and instead views them as 'negotiation tools of social encounters.' they disagree on the fundamental link between expression and emotion: ekman draws strong correlation, whereas fridlund draws no such conclusion. however, both concur that interpretation of expressions is highly dependent on the context, and cannot be judged in a social vacuum.

in terms of signalling, ekman links emotional leakage to micro expressions and body language, yet interprets broadly-observed faces as filtered through a socially conventional self. most everyday interactions would deal with the filtered self, with trained experts more privy to the signs of inner emotion that would reveal itself through imperceptible tics, movements, and behaviors. the filtered face expressions signal what the user intends for the audience to see while reaping benefit, yet the glimpses of truth through the micro expressions would be indicators (but not signals, since the actions would be involuntary) of real emotion. this runs completely counter to fridland's thesis, in which all facial expressions are social signals and absolutely nothing becomes accidentally 'leaked'. if a costly truthful emotion were leaked without the person's control, there remains no reason why a costly signal would have persisted through evolution.

are facial expressions reliable? in ekman's case, it depends on how educated the audience is in reading facial and bodily movements. if you have a completely normal, unspecialized audience, the facial expressions may not be entirely reliable, since the public face becomes filtered through a sophisticated social convention layer. however, if the audience knows how to read the involuntary acts, they might be able to read the hidden emotions which are leaked to the surface without the knowledge or control of the communicating person. since these micro expressions would be intrinsically tied to a particular emotion, these signals (or cues) would be highly reliable indicators of emotion. they'd only work while present; the absence of the signals wouldn't necessarily imply the absence of a particular emotion. for involuntary actions, an example of a crying person would imply they were passionate, sad, or moved.

viewing it through fridlund's theory, facial expressions would be inconsistent in its reliability, since all visible forms of facial expressions wouldn't be related to emotion in any direct, unmediated way. expressions serve as signals for behavioral and social benefits, not as emotional displays. as far as reliably describing a person's innate emotional state, drawing straight conclusions on personal qualities would be shaky; however, the facial expressions would be a reliable signal as to reading qualities of a person's intention in manipulating a social relationship or environment. if a person cries, it is a reliable signal of their desiring of a reactionary behavior, including increased attention, pity, sympathy, or physical contact.

What does Zebrowitz mean by "overgeneralization effects"? What is an example of a physically based overgeneralization, a culturally based one and a personal one? Can you re-frame her discussion about different cues (signals) of traits (qualities) that are seen in faces in terms of signaling - are these signals assessment signals? What are their costs? Go to a public place and observe 4 different people you do not know. Write down what your impression is of each of them. How much is your impression drawn from their face, their clothing, their actions, etc? Concentrating on the face, what sense of the person do you derive from it? Can you articulate why? Do you think any of the "overgeneralization" processes that Zebrowitz describes played a role in your interpretation? What about other categorzation processes?

an overgeneralization effect can be succinctly described as labeling, quick categorization, or bias. with one glimpse, one must observe a host of various inputs and then judge the person based on these instantaneous signals. hence, the impact of the first impression. there are many examples of physically-based ones, including symmetry of the face (asymmetry may be overgeneralized as infirm body or mental health), frown or smile lines (which are permanent artifacts of repeated frowning or smiling over time, which generalizes to a pessimist versus optimist), or animal associations (someone who has close-set eyes might be seen as cunning and foxlike, or someone who has a turned-up nose might be seen as greedy and piggish). a culturally-based overgeneralization example might be a woman's cosmetics that look quite outdated and old-fashionedly applied; she could be seen as unfashionable or dowdy. a man who appears to smile too much, including during more sober moments, may be seen as smirkish or oblivious. a personal overgeneralization may happen if you judge someone as one way or another because of previous experiences with a similar looking or acting person ('ugh, that guy looks just like my ex'). this is the 'case of the mistaken identity' phenomenon. everyone's individual circumstances and history frame a significant amount of face-reading (such as we discovered earlier when reading and interpreting signals in the real world.)

some of the physically-based overgeneralization signals could arguably be assessment ones, such as wrinkles that record the memory of many repeated facial expressions, and physical symptoms of actual disorders (such as down's syndrome or hepatitis). using the linkage path of 'biology directly causes physical feature', features share hormonal or genetic sources. this may be reliable, but the cost is that the nurture/environmental component of the person is still not yet ably revealed. someone may have been born a certain color or shape, but that is beyond the control of the actual person (though may throw light on their parents or family). the assessment becomes weaker with 'physical and social environment causes physical feature' and even more so with 'psychological traits cause physical feature' (where there's even a common mimicry or deceptive channel running in parallel). someone may look upset from their eyebrows, but they might just be reacting to an itchy or painful thought. someone may look tan and healthy, but they remained indoors and just went to the tanning salon. the costs for the signaller become lower and lower the easier they are replicated with common sense or common technology (plastic surgery, cosmetics, conscientiously smiling more), and therefore the cost to verify such signals increases with more probing for the truth.

[i'm going to write about this after getting coffee tomorrow at 1369. will blog post-haste.]

Faces are used to recognize people, to assess their character and gauge their emotional state. In a mediated environment, we may be able to design interfaces so that none, some or all of these functions are possible. What are the costs and benefits of each of these functions, to bo the signaler (the face) and the receiver (the viewer). How might you design a face that that purported to show character and emotion, but not identity? Can you show identity without the markers of character? When do you think seeing someone's face is important in a mediated environment? Why? In what form? What about videophones - do you think they will eventually replace or supplement the audio-only phone or is there a deeper reason why they have never been successful?

recognition:
for the signaler, the revealing of real-life identity online (being 'named') has the cost of maintaining a constant, consistant persona. there's no capability for escapism, the freedom that comes from reinventing yourself online to strangers and friends alike. being recognizable also carries over negative traits that one might be identified with in the real world (e.g. people might recognize you as the weird, unattractive barista at the corner cafe), and whatever you do online can suffer consequences when returning to the real world. there exists no separation between an online and offline identity. however, the benefits include having one's real-life positive traits carry through ("trust me, you can clearly identify me as a loyal friend") , and showing one's true self creates consistency through all interactions, off- and on- line, which might be handy now that there's so much overlap between face-to-face and electronically mediated communication with those that are in close proximity to us.

for the viewer's cost, one cannot fantasize about sender's identity, especially if it's someone quite familiar. if the sender is particularly known as someone negative ('they're so disrespectful'), the viewer who sees him online will suffer similar negative reactions and resist or curtail interaction.
the viewer benefits much more, as they can have more security in interaction (the delivery of content from the viewer can be tailored specifically and accurately for the listener), and one can rely on a consistent identity + relationship since every action online or in reality can be sourced accordingly.

assessment of character:
i'm assuming this assessment is something like a placard profile that lists some adjectives and personality traits and tendencies that have been established and defined with a user's history. if a signaler has a negative reputation, the costs to him is that he cannot begin again with a clean slate, and cannot easily shake off the established character label. also, even if the assessment were true ('party-loving', 'nurturing') and not necessary negative, the traits wouldn't be desirable to be shown to certain other parties, such as casual vs formal or family vs work crossover. however, the
benefits to the signaler include a more clearly defined expression, since context of personality provides a basis on delivery of communication. words coming from a shy, reticent person might mean something completely different from the same words coming from a politically-active extremist. nuance can thrive, with knowledge that more information about the sender is available.

as for the viewer, costs include general preconception + prejudice of the sender. too much information can unfairly bias the interaction in an over- or under- stated way. a label can be helpful as a hint, but there are so many slanted connotations for every personality trait that nothing can be interpreted at merely face value. however, with this additional knowledge, a viewer can strategize encounter, by customizing one's delivery through the filter of audience and relationship status, and thus the context helps tailor the interaction for efficiency and acceptance.

gauge emotional state:
if the signaler is able to express somehow his emotional state, this may be costly if detrimental or unwanted things leak through. such as the involuntary micro expressions might give away inner emotions a la ekman, the displayed emotional state might not be the intended signal, or may be misleading. the emotional state should be clearly displayed to all parties, or else an unexpected reaction might ensue from a misaligned concordance. the benefits to the signaler are a truer representation, and therefore the communication can be read through the context of expression. someone writing in short, terse words make sense from someone who is stressed and anxious, whereas they might seem impersonal or aloof when read through a neutral standpoint.

the viewer with the knowledge of emotional state may err on over- or under-reacting to a display. if the gauge reads 'a little fearful' then that might mean anything from dreading a next-day exam to a spider dangling from the ceiling at that moment to worrying about secret-spilling. it might be too much information; in the real world, even physical faces do not necessarily correllate to a specific emotion. we dont even have an emotional state gauge offline! also, there's the possibility that the emotional gauge is incorrect, in which the whole interaction can be thrown off. the whole dialogue hinges on complex, subtle exchanges of courtesy and emotion. there's a give and take.
however, the benefit of knowing emotion is, of course, additional context, and can be used as a behavioral predictor for further words or action.

as for displays that would reveal character + emotion, but not reveal recognizable identity, i thought of a webcam-like application that would use a heavily high-contrast monochrome filter, so that everyone's face would be colorless and comicbook-like. since real facial expressions serve as the input, the displayed expressions would correspond accordingly. the distortion would mask identity while liberating nuanced faces. another idea is just to have straight-up video of one facial feature, such as the eyes or the mouth, instead of the entire face at once. you can read a lot through small movements in such features. you might need to translate to monochrome, but recognition would be challenging without key pieces of tie-in information.

as for showing identity without markers of character, i'm a little confused, though a clearly displayed truthful static photograph could at once reveal identity without the extra information of an animated face.

seeing a face, replete with expression and life and nuance, can be important when dealing with highly selective + specific interactions online, such as business transactions, talking to a close member of the family, or if you choose to shield away from certain kinds of people. you would want a recognizable face if identity is crucial ('this person and this person only should hear it this way'), and an emotional face if it's an introductory sort of interaction and each party wants to size up the other quickly and accurately. i'm thinking that it might be useful in ebay or other sorts of quasi-anonymous communities to remind everyone that there actual people on the other end of a username. this might allow for more humane, realistic expectations.

spew on videophones:
have to be stationery in front of machine -- portability?
have to 'look' presentable, not make any errant faces or dress
need to show continual attention, but to a machine
cannot viably multitask
skipping video is annoying
have to maintain gaze
cant glance at their shoes or background, no distractors