Welcome to the sixth and final session of VR/AR Fundamentals for my NYU Shanghai class, where I’m visiting faculty. The class is in the Interactive Media Arts group, modeled on NYU’s venerable Interactive Telecommunications Program, and is consequently very hands-on. But for this class, I’ve subjected the students to my sessions and to experiencing lots of VR directly before beginning any production. It worked well last semester. You’ll see some results and hear their attitudes in the Epilogue.
First, the final session. More than the other sessions, I’ve had the fortune (and occasional nightmare) to have several direct experiences here, so please excuse me in advance for being a little more personal.
What is Live?
I asked the students. “Real time” answers one. I ask “can real time mean in a second? Ten minutes? One hundredth of a second?” There was general agreement that it depends. Real time in conversation, pizza delivery, and playing a first-person shooter game all require different timeframes.
I proposed two other distinctions from live: “Fresh” and “Canned.” Fresh is when something is still timely. Using the news cycle as a metric (or bread), we can define fresh as roughly within a day. Canned is archived.
Live, Fresh & Canned Media. Three important distinctions.
So, when did “live” begin?
Live before the Telegraph
In terms of anything outside of shouting, horns, drums, or semaphores, distanced, or remote, live had a beginning: electricity. The first remote live anything was the telegraph. Before that, it’s hard to image our concept of live today.
Could Abraham possibly imagined what his family was doing back in Canaan in real time? Same holds for Jesus, Buddha, Mohammed, and Confucius.
Centuries later, could Columbus in the New World even think about what Queen Isabella was doing any given moment? Timezones didn’t even exist yet. Or could Napoleon in Egypt imagine what Josephine in Paris was having for breakfast?
When Jules Vernes wrote “Around the World in Eighty Days” in 1873, even the title was a gotcha, like a book today titled “To the Moon in an Hour.” What seemed science fiction was based on the newest technologies of the day, including the opening of the final section of railway in India the year before, and the opening of the Suez Canal together with the completion of the first American Transcontinental Railroad, both three years earlier.
Live, as we know it today, is entirely based on the instantaneousness of electricity.
For decades, virtually all politically potent live was in broadcasting. Incredible as it may seem today, radio and television broadcasting were both initially entirely live. Telephones were live but one-on-one, and one-to-a-few conference calls were relatively rare. Remember, these were the days before Twitter. Live was mostly centralized.
Webcams & Live Web Video
As soon as the nascent Internet could handle it, webcams appeared,
The “world’s first webcam” was aimed at the coffee machine in the University of Cambridge Computer Lab, set up in 1991 to save people from walking down the hall to find the coffee pot empty. It’s resolution was 128 by 128 pixels and it updated once per second.
By the late 1990s, live webcams were a thing. They consisted of a stationary video camera tethered to a computer and the Internet, did not have a human operator behind it, and most of the time, nothing of any interest was happening in front of it. Popular webcams then included Jennicam (young single woman with several cameras in her home, including bedroom), Time Square cam (part of a larger network called Earthcam), and Africam (remote locations where animals sometimes come).
The fascination with these webcams were less about watching a story unfold and more about simply seeing what, if anything, was happening.
During this early web video period, I initiated a project at Paul Allen’s Interval Research Corporation to address a unique search problem for live video: time. Suppose you see a rhino at the watering hole above, how can you propagate this information quickly? And how can someone looking for interesting webcam activity find what’s live now?
The project lead to a 2001 Interval spinoff called Kundi.com. The website offered a “Hot Now Button” that the Africam viewer could press when he sees the rhino at the watering hole, and a “Hot Now List” which anyone could visit and search, and which updated every ten seconds, something unheard of on the Web at the time. Kundi is Swahili for flocking, swarming, and herding.
A recent example of such a live webcam was when April the Giraffe gave birth in 2015. Her webcasted pregnancy went on for weeks. When she gave birth, 1.2 million viewers watched, and when the YouTube live feed was initially shut off afterwards, she had over 232 million viewers.
Of course, propagation that April was giving birth happened in real time on Twitter and other social media.
Incredibly though, there’s still no infrastructure for a comprehensive list of “what’s live now:” nowhere can you search “squirrel” and see any live video streams tagged “squirrel.” Not fresh. Not canned. Live. The vision, to me at least, is clear (I think it comes up #1 on most Google searches for live global video). We don’t know for sure, but it’s a good guess that right now, several million live video feeds are openly accessible.
There’s a dark side. Even when setting up Kundi years ago, we discussed what to do about murders, suicides, and other tragedies that may be webcast live. Our conclusion then was that we needed an army of volunteers 24/7 to monitor, particularly looking for “spikes” in viewer numbers, and believed that instant shutdowns may even prevent some tragedies. Last year, after a bumpy start, Facebook Live announced that it would add 3,000 workers to monitor for inappropriate behavior.
There’s another dark side. Do a Google search for camera zapper and what comes up #1? Me. Shortly after September 11, 2001, I became obsessed with whether it was possible to stop the gaze of a camera. The answer is yes, with a common laser pointer, at distances of hundreds of meters, if (and a big if) you know where the camera is and can aim at it accurately. I posted everything I learned online, the New York Times did a feature story on it, my website received over 600,000 hits in two days, and I received several purported death threats.
What I learned was that cameras are very sensitive issues. Watching how things have played out since, Silicon Valley has demonstrated grossly miscalibrated perceptions from the general public on this.
The conclusion of my posted research, ironically, was that anyone who wants to hide a camera, can hide a camera.
Another element of liveness is remote control, a phrase that couldn’t even exist before liveness. Here are two emblematic early works.
The Telegarden, by Ken Goldberg and Joseph Santarromana, was a remote garden as an art installation. Anyone could remotely plant seeds and water selected areas via an industrial robotic arm, and monitor the results via live camera. Here’s a succinct, must-see one-minute video posted in 2011.
Pachube (pronounced “patch bay”) was launched by Useman Haque in 2007 as a “generalized realtime data brokerage platform,” an early Internet of Things project. Anyone with an input device and anyone with an output device could register and they would appear on a world map.
I could have a light bulb registered and plugged into my networked computer and you could have an on-off switch registered and plugged into your networked computer, then you could turn on and off my light and I would see it, a simple and powerful idea. Pachube was acquired by what is now Xively in 2011.
In 2009, Linz, Austria was a designated EU Culture Capital city and I was invited by Ars Electronica, based there, to help produce their contribution. We had a €2.1 million euro budget and space in the Hauptplatz to build a pavilion for the summer.
We initiated an open global competition called Live Bits: Art Exploring Realtime Connectedness offering up to twenty €10,000 commissions. We were particularly interested in unconventional live bits more than simply live video. We received 295 entries from 42 countries, from which we selected 15.
One project, “Blowing Air from Beijing to Linz,” combined live video with a remote controlled burst-fan that allowed the Beijing performers to, after goading their Linz audience with live video of steaming hot Chinese dishes, blow a “puff of smell” of the food via pre-made “smell bars.”
Other projects included a vibration platform triggered by real time traffic at the entrance of the Gotthard Tunnel in Switzerland; micro-blogging from industrial workers in Pitesti, Rumania; smiles from and to Bhutan; live audio from a town square in Gaza; and this:
“WIA-WIA — Water in Africa-Water in Austria” was proposed by Melissa Fatoumata Touré, a young woman from a village in Mali who had received a college degree in engineering. The entire village got their water from a single pump in the town square, and her uncle had recently opened an Internet Cafe nearby. WIA-WIA proposed instrumenting the pump with a live sensor which would control a functioning toilet in our Linz pavilion. If not enough water was pumped in Africa to allow a flush in Austria, the user would have to put coins in a slot to compensate.
WIA-WIA was realized and operational, and halfway through the summer-long run, it was discovered to be a hoax by Berlin-based artist Niklas Roy. Admittedly, we were easy marks (and the image above is obviously Photoshopped). Roy was invited to speak and WIA-WIA continued its run, and in the end earned a total of €1,470 which Ars Electronica donated to Caritas International to support a water project in the Congo.
How to Democratize Webcams?
I asked the students. As we’ve seen above, live cameras and live bits offer unique opportunities and unique challenges, particularly in deeply remote places. So, what’s in it for them, the remote locations and their inhabitants?
Think VR webcams. Live VR webcams.
You’re National Geographic and you want to place a live VR webcam in a village commons in Papua New Guinea. You’re UNESCO and you want to place live VR webcams in sites on their Endangered List. Or you’re CNN and you want to deploy live VR webcams to Haiti during another earthquake. What’s in it for the PNG villagers, or Timbuktu Tuaregs, or Haitian earthquake victims, and why wouldn’t the expensive VR webcam simply disappear?
Can a live VR webcam have a two-way component? For example, can one or more “one-on-one stations” be near or under a VR webcam, allowing local individuals to interact remotely with others? Are there alternative two-way components? Or is the only real solution money, where locals are paid to keep things running?
Travel and tourism, one of the world’s largest industries, is especially ripe for VR and AR (and I have my own perspective.) Live can be an magical component. But how? We don’t yet know.
Social VR and AR entails people interacting with other people live, and by its nature, is two-way and symmetrical. It’s more like telephone and email than like radio and television, as noted above.
Early Live Social
Two different projects stand out as emblematic of “live social,” in terms of the priority for symmetry among participants.
LA-based Electronic Café explored what they referred to as “aesthetic research in telecommunications” beginning in 1975. Their earliest tool was a “slow scan” box, which interfaced a video camera to a telephone line to send and receive live (or “live”) images: typically one still black and white image per minute. ECafé produced numerous art events connecting two remote groups using two phone lines, one for slow scan and the other for live audio.
In 1980, the ECafé managed to get very costly satellite time for three days of live video and audio between two sidewalk sites, one in New York City and the other in Los Angeles. The project, called “Hole in Space,” had several defining features. First, it was symmetrical media. Second, it was “head to toe and life sized,” what we’ve referred to as orthoscopic, an important property of VR and AR, in session 1. Finally, it was unannounced by design. Initial encounters were spontaneous with participants asking “where are you?” while pre-arranged meetings were common as word spread. Video of “Hole in Space” is part of the New York Museum of Modern Art permanent collection.
The second project is a first hand experience, though I wasn’t actively involved. One of the earliest projects at Interval Research in the mid-1990s was to instrument several private offices each with two very high quality microphones and two very high quality speakers sitting on the desks, all connected to a computer-controlled switcher. Users could switch from office to office or chose to opt out, for example during private phone calls. The effect of highest-quality ambient sound — no story, no event, no beginning, no end — was mesmerizing and difficult to put in words. Hearing nothing but breathing, typing, sipping coffee, and fidgeting conveyed, well, liveness.
Conventional telephones, with their limited bandwidth, are still pretty good for teleconferencing. Some of the newer apps like Apple Facetime Audio offer noticeably higher audio quality. But compared to audio, teleconferencing video offers much greater challenges.
These challenges are mostly covered in session 1 (Audiovisual Resolution & Fidelity) and session 2 (Audiovisual Spatiality & Immersion). For example, teleconferenced people should appear orthoscopically correct, as in “Hole in Space” above, rather than tiny as we see on mobile and computer screens. And since realworld conferencing takes place between people near each other, teleconferencing requires stereoscopic imagery. But who wants to teleconference with someone wearing 3D glasses or worse, a VR or AR headset? The best solution will require autostereoscopic displays.
Then there’s eye contact, perhaps the biggest difference between realworld conferencing and teleconferencing. As we all know from the real world, there’s a huge difference between near eye contact, such as Skyping from our laptop using the camera above, and actual eye contact. Being off even a little is disturbing. The traditional solution is to build a big box the size of the 2D screen and place a half-silvered mirror inside at a 45 degree angle, then place the camera perpendicular to it so that the camera and display are on-axis. It works, I’ve tried it, but it requires a cube the size of your display.
A novel but yet-to-be demonstrated alternative approach is to integrate camera sensing elements next to the image display elements (pixels) inside the display itself. Apple has a patent on this.
Such an integrated sensing display would produce perfect eye contact, but with an issue: Would you use a display that could be looking at you with no indication and no way to block? At least little cameras above can be masked with tape or a thumb when desired.
MUDs, MOOs, MMORPGs, & Second Life
MUDs (Multi-User Dungeon), MOOs (MUD, object-oriented), and MMORPGs (Massively multiplayer online role-playing game) were all early attempts at live, symmetrical, multi-player role-playing games, evolving from text-based for small groups to 3D graphics-based worlds for very large communities. For example, World of Warcraft had 10 million subscribers in 2014.
Something changed with Second Life, launched in 2003 by Linden Labs. In addition to higher resolution 3D graphics (though still “cartoon-like” avatars), the goal disappeared.
Second Life, according to Linden Labs, is “not a game,” and “there is “no manufactured conflict, no set objective.” It’s simply an open-ended online virtual world where people can connect, hang out, and share experiences together.
Facebook & Social VR
The Oculus / Facebook VR “birth legend” is well-documented.
The shared vision is indeed Second Life-like, so it’s no surprise that The Social Network and The VR Company would join forces and call their initiative Social VR. And Facebook’s goal, stated last October my Mark Zuckerberg, is:
“We want to get a billion people in virtual reality.”
And this is where we end our sessions.
We will get a billion people in virtual reality, and augmented reality, by Facebook and others. The technologies are all pointing toward its inevitability. And access, in the end, must be global and unprivileged.
Yet, there are so many big unanswered questions.
I asked my students to write down things they were most excited about VR and AR and things they were most concerned with about VR and AR, and received 58 replies. They roughly organize like this.
These are my words after reading all of the replies. Having New Experiences included the pre-made while Making included prototyping and DIY. Abuse included cyber-bullying, terrorism, crime, legal, and privacy. Anti-Social included people-related concerns while Disconnection included loss of distinguishability between VR and the Real World. Health included medical and psychological. Inequity included democratization and accessibility.
The top-level noteworthy observation is that everyone was both excited andconcerned.
We were surprised to see VR / AR being both Social and Anti-Social.
In ensuing discussion, both concerns about health and excitement about virtual travel came up more than the lists reflect.
Other than some short hype at the beginning and a somewhat clunky but also short transition to the next place at the end, we found almost nothing offensive about Mark and Rachel’s words. They spoke about using Facebook Safety Check and Community Help, about fundraisers, about sending Facebook employees in to help with connectivity, about working with the Red Cross on AI-assisted population maps to connect relief workers, and about donating over $1.5M additionally as well. If it were a radio interview, we don’t believe Facebook would have gotten slammed by the press and public.
Why did VR become an anti-empathy machine here?
The students were spot-on about social and anti-social, and about new experiences and disconnection. Clearly, we still have a lot to learn.
In understanding VR and AR fundamentals, and in experiencing lots and lots of VR titles, the students were encouraged to explore the unexplored. One of the ripest areas of exploration is the space between linear VR cinema and interactive VR games. Students were assigned to make short VR sequences with real people in constructed environments, with the people doing synchronized movement and sound, interactively. The goal was to explore how to represent people intimately and interactively in VR.
We shot with a 24 lens stereo-panoramic VR camera courtesy of Jaunt VR and managed to move the material from VR video into a game engine, Unreal, ultimately with mixed results.
The students were divided into four groups and had two shots at their projects. Here are some stills from the VR:
And a “Sunday project:” 6 students singing 47 instruments of a short version of Beethoven’s “Ode to Joy,” with a VR interpretation of the layout of the Dublin Philharmonic Hall.
One lesson we all glimpsed was the differences between learning pre-existing tools (good for virtuosity), hacking pre-existing tools (good for novelty), and inventing new tools (good for game changing). Somehow this intersects with playing by the rules versus making new ones. Finding the right balances, especially for VR and AR at this pivotal moment, is a huge challenge with a high payoff.