6.S063 Spring 2017 Final Project
by Cattalyya Nuengsigkapian
'
The app that let everybody create and share their own bedtime story with their favorite characters and toys at the same time.
This week I listed the project area that I'm interested such as AR, Computer Vision, Photography etc. I research on related project and research papers to obtain more ideas and thought about how reasonable it is to invent in a single semester.
Practice drawing using Illustrator and included it in presentation.
This week I present my top 5 ideas to the class: Top 5 idea slide
Make final decidion for project idea, research on existing sensors, and find how to satisfy all requirements.
After I decided to work on AR Bedtime Story, I began to research on iOS Augmented Reality development and chose among Swift, Objective-C, React Native, and Unity. I'm familiar with React Native, but it didn't have much tool supporting AR, so I initially used Swift with its library ARKit hoping that I will have more power to control the mobile than using Unity. After I played around with it for serveral days, I believe that to achieve all my features within a month scope, I needed tools that will facilitate image recognition which Vuforia platform in Unity can provide to me.
This week, I learned about Vuforia, its image target, markers, etc. and started building my iOS AR app with Unity. I finally produced the AR iOS app that can scan example image targets and show the 3D figures. I designed my laser-cut parts that will be figure basements (image frames) to turn normal printed paper to toys. The criteria of my figure basements are 1. endurance - water proof, torn 2. low cost 3. scalable - can be reused 4. safety.
For electronics part, I researched on making light following robot and plan to do bristle bot. I also built simple circuit to read light dependent resistor from microcontroller and displayed its analog values.
I worked on lasercut and fix my drawing to make all the pieces fit tightly together. For software side, I tried using custom image target and integrated to Vuforia Image Cloud Recognition. [Dynamic image target] I implemented UI in the app that allow user to click button to change the model on a specific image target. [Story selector] I used cloud reco to scan image that represent the story title such as Cinderella and Beauty and the Beast to assign all related figures to all the image targets. On wednesday, I present my half-way project during the midterm presentation.
The beta version currently has bug that when user scan story title image such as Beauty and the Beast, figures on image targets aren't immediately changed. To refresh figures, we have to bring image target out of camera frame and bring it into the frame again to trigger image target deteaction for refreshing. I found and solve the bug by register listeners, all target images, to the story title change story target image.
Moreover, I also plan about how can I solved the scalability issue when users have hard time finding their desire figures among possibly thousands of figures in the future. By this, I plan to use speech recognition to help assign models to different image targets.
About the electronics part, I then decided to make smart basement (image frame) rather than light following robot, since I believe that I can make more interactive user experience with that. I planed to use soft circular potentionmeter to detect user touch dragging direction. This dragging gesture will be used to select or scale the model. Note that these features are replaced by changing animation and clothes which are more interesting since the model selection can be done with speech command.
After searching for video screen recording, I didn't found tool that support voice record at the same time as screen recording. I integrated Everyplay assets to my app to support video screen recording and sharing. It also supported Facecam: recording our user face from front camera including our voice at the same time, but this feature isn't work with our AR app since it cause bug frozing our AR screen. Without the Facecam, Everyplay didn't support voice recording along the screen recording, but this issue is acceptable since Everyplay support video editing that user can record their voice easily and my extra plan on having subtitle can be used to help post record editing.
For Bedtime Story Sharing, Everyplay supported social media sharing such as Facebook and Twitter. I also created my AR Bedtime Story space in Everyplay for my users to share their stories with other users.
I integrated Google Cloud Speech Recognition and displayed subtitle on screen. Speech recognition supported 2 options 1. autodetected speech 2. manual clicking start and stop speech recognition. The first option is very convenience and fits our goal for subtitle, but I also keep the second option that could be better for noisy environment and for voice command.
I made use of auto voice detection by extracting keywords from speech and auto assign the model to an unoccupied image target. This auto model assignment can be enabled by clicking "clear model" button to clear targets.
I also make speech recognition support command to assign and clear model from the specified target using speech like "Command assign model_name to image_target" or "Command clear image_target" as demonstrate in the Demo Video.
[Electronics] After I got my soft potentiometer, I built the circuit and read its value as I did in earlier with the LDR.
I tried to integrate switch and external power to my circuit, but I connected it incorrectly and blowed up MicroPython, so I have to ask for a new MicroPython from TA.
Without switch, I tried sending touch position data via wifi. I wrote main.py script for MicroPython that will connect the microcontroller with my iPhone hotspot and send data to ThingSpeak. However, ThingSpeak didn't allow me to successfully send multiple data in short length of time: I have to wait around 30s after latest data sent to send another data.
Since the realtime behavior is important aspect to allow user to interact with the model immediately. I decided to implemented my own server with NodeJS and use normal HTTP request to send and request for data. Since my server only support my single client rather than million clients on ThingSpeak my request is very fast and can be done repeatedly in every millisecond.
The data sent is the sequence of touch position for each dragging starting from touching and releasing. By groupping into touch sequence rather than a single touch position, I avoided major useless request that can took up network resources.
[Lasercut] [Electronics] After I got a new MicroPython, I designed lasercut for the smart basement, soldering and assemble them together.
[Wifi Data] I did GET request from iOS app to check if there is any new dragging every 2s and obtain dragging speed and direction. I chose 2s to avoid draining battery and using too much internet connection resource while maintaining realtime manner because people dragging usually took about 1.5s, so 0.5s + network lagging from my own server is very small.
[Server][UI] I reduce computation burden in iOS app by making server compute and translate touch data sent from microcontroller into dragging direction and speed. The touch gestures are 1. dragging Clockwise to change animation of 3D models 2. dragging Counter-clockwise to change clothes (called version in the code) 3. Static touch - change the model. The server compute speed and direction for 1,2 and average touch position for 3. Although I didn't ended up fully exploit all these data like speed, I thought it will be useful to scale project to support more gesture input for more control over model in the future.
[Speech] I further added more special effects from speech keywords such as "raining", "rain" to produce rain and raining sound and "snow", "snowing" to produce snow animation in the scene.
Final Presentation Slide