A Detailed Explanation of the Technology Behind Ultra-realistic Movements

About us
Company profile Corporate culture Qualifications and patents Technological advantage Service advantage Contact us
ODM service
Service introduction Service content Service field Cooperation case
Application field
Military police Recreation Literary form Healthy and healthy Industry Emergency rescue Aesthetic medicine Agriculture Service industry
Technical scheme
Refrigeration technology Electric heating suit Electron pulse Acoustophotoelectric technology Internet of Things Mechanical engineering Software data Physical chemistry Bioengineering
News information
Company news Industry news

A Detailed Explanation of the Technology Behind Ultra-realistic Movements — Motion Capture

Column：Industry news Time：2023-08-31 Publisher：智能量

Gollum in the movie *The Lord of the Rings*, the plush teddy bear in *Ted*, the tribal princess in *Avatar*... The vivid performances of those classic virtual characters in movies can always deeply move the audience. Behind the scenes, the key to bringing them to life lies in an important technological technique — motion capture.

Motion capture, abbreviated as Mocap, refers to the technology of recording and processing the movements of humans or other objects. After multiple cameras capture the movements of real actors, these movements are restored and rendered onto the corresponding virtual characters. The application of technology in this process is motion capture, which is expressed in English as "Motion Capture".

Motion capture technology involves data that can be directly understood and processed by a computer, such as dimensional measurement, the positioning of objects in the physical space, and the determination of their orientations. Trackers are set at the key parts of a moving object, and the Motion capture system captures the positions of these trackers. After being processed by a computer, data of three-dimensional spatial coordinates are obtained. Once the data are recognized by the computer, they can be applied in fields such as animation production, gait analysis, biomechanics, and human-machine engineering.

The Background of Motion Capture Technology

The origin of motion capture is generally believed to be the rotoscope invented by Fleischer in 1915. This is a technology that emerged in the production of animated films. Artists simulate the realistic performances of animated characters in the virtual world by meticulously depicting each static frame of the live-action footage played for them.

This process itself is dull and tedious. However, fortunately and memorably for these animators, in 1983, the Massachusetts Institute of Technology (MIT) developed a set of graphical marionettes.

This system used an early optical motion capture system called "Op-Eye", which relied on a series of light-emitting diodes. By formulating movements, it was able to generate animation scripts (Sturman, 1999). Essentially, this marionette served as the first set of "motion capture suit". It came with a very limited number of sensing balls, which could roughly locate the positions of the key skeletal points of the human body structure.

The emergence of this technology quickly laid the foundation for the rapid development of motion capture technology in the following days, provided a direction for the subsequent development of various motion capture technologies, and also set off a trend in motion capture technology, including the motion capture technology we have today.

The Basic Principles of Motion Capture Technology

A motion capture system refers to the professional technical equipment used to achieve motion capture. Different motion capture systems are based on different principles, and their system compositions also vary.Generally speaking, a motion capture system is usually composed of two major parts: hardware and software. The hardware generally includes signal transmitting and receiving sensors, signal transmission devices, data processing devices, etc.; the software generally includes functional modules such as system settings, spatial positioning and calibration, motion capture, and data processing.The signal transmitting sensors are usually located at the key parts of the moving object, such as the joints of the human body. After the continuously emitted signals are received by the positioning sensors, they enter the data processing workstation through the transmission device. In the software, motion calculations are carried out to obtain coherent three-dimensional motion data, including the three-dimensional spatial coordinates of the moving target, the six-degree-of-freedom motion parameters of the human joints, etc., and three-dimensional skeletal motion data is generated, which can be used to drive skeletal animations. This is the general working process of a motion capture system.

The Composition of Motion Capture Technology

Sensor

The so-called sensor is a tracking device fixed on specific parts of a moving object. It will provide the motion capture system with the position information of the movement of the moving object. Generally, the number of trackers is determined according to the level of detail of the capture.

Signal Capture Device

This kind of device will vary depending on the type of the motion capture system, and it is responsible for capturing the position signals. For a mechanical system, it is a circuit board for capturing electrical signals, while for an optical motion capture system, it is a high-resolution infrared camera.

Data Transmission Device

The motion capture system, especially the motion capture system that requires real-time effects, needs to quickly and accurately transmit a large amount of motion data from the signal capture device to the computer system for processing, and the data transmission device is used to complete this task.

Data Processing Device

The data captured by the motion capture system needs to be corrected and processed, and then combined with a 3D model to complete the work of computer animation production. This requires us to use data processing software or hardware to complete this task. Whether it is software or hardware, they all rely on the high-speed computing power of the computer to process the data, so that the 3D model can move truly and naturally. In the play, Tom Hanks wore a black tight-fitting suit covered with 150 sensors, so that the computer could capture his eyelid, lip, eyebrow movements, and even every expression and movement of his body.

The Types of Motion Capture Technology

There are many types of motion capture systems. Generally, according to the technical principles, they can be divided into five major categories: mechanical type, acoustic type, electromagnetic type, inertial sensor type, and optical type. Among them, the optical type can be further divided into marker-based optical and markerless optical types according to the different types of target features. Recently, the so-called thermal energy motion capture system has emerged on the market. Essentially, it belongs to the category of markerless optical motion capture, except that the optical imaging sensor mainly operates in the near-infrared or infrared band.

Mechanical type

The mechanical motion capture system relies on mechanical devices to track and measure the movement trajectory. A typical system is composed of multiple joints and rigid connecting rods. Angle sensors are installed in the rotatable joints, which can measure the change in the rotation angle of the joints. When the device is in motion, according to the angle change measured by the angle sensor and the length of the connecting rod, the position and movement trajectory of the end point of the rod in space can be obtained. X-1st is a representative of this type of product. Its advantages include low cost, high precision, and high sampling frequency. However, the biggest drawback is that it is inconvenient for action performance. The connecting rod structure and the sensor cables impose great constraints and limitations on the performer's movements. In particular, continuous movements are hindered, making it difficult to achieve a realistic dynamic restoration.

Acoustic type

An acoustic motion capture system is generally composed of a transmitting device, a receiving system, and a processing system. The transmitting device generally refers to an ultrasonic generator, and the receiving system is generally composed of an array of more than three ultrasonic probes. By measuring the time or phase difference of the sound wave from a transmitting device to the sensor, the distance to the receiving sensor is determined. The position and direction of the ultrasonic generator relative to the receiver are calculated based on the distance information obtained from three receiving sensors arranged in a triangular pattern. Its greatest advantage is low cost, but the disadvantages are poor accuracy, low real-time performance, and being greatly affected by factors such as noise and multiple reflections.

Electromagnetic type

An electromagnetic motion capture system generally consists of a transmitting source, receiving sensors, and a data processing unit. The transmitting source generates an electromagnetic field in space that is distributed according to certain spatio-temporal laws. The receiving sensors are placed at key positions on the performer's body and move in the electromagnetic field as the performer moves. The receiving sensors transmit the received signals to the processing unit via cables or wirelessly. Based on these signals, the spatial position and orientation of each sensor can be calculated. Companies such as Polhemus and Ascension are representatives of the manufacturers of this type of product. Its greatest features are simple operation, good robustness, and real-time performance. The disadvantages are that it is sensitive to metal objects. The distortion of the electromagnetic field caused by metal objects has a great impact on the accuracy, the sampling rate is relatively low, which is not conducive to the capture of fast movements. The connection of cable-type sensors also restricts and obstructs the action performance, and is not conducive to the performance of complex movements.

Inertial type

The inertial sensor-based motion capture system is composed of attitude sensors, signal receivers, and a data processing system. The attitude sensors are fixed on the main limb parts of the human body, and transmit the attitude signals to the data processing system through wireless transmission methods such as Bluetooth for motion calculation. Among them, the attitude sensor integrates elements such as inertial sensors, gravity sensors, accelerometers, magnetometers, and micro gyroscopes to obtain the attitude information of each part of the limbs. Then, combined with the length information of the bones and the hierarchical connection relationship of the bones, the spatial position information of the joint points is calculated. Representative products include Xsens, 3D Suit, etc. The main advantages of this type of product are strong portability, simple operation, and almost no restrictions on the performance space, making it convenient for outdoor use. However, due to the limitations of the technical principle, the disadvantages are also quite obvious. On the one hand, the sensors themselves cannot perform absolute spatial positioning. The spatial position information obtained through the integral operation of the attitude information of each part of the limbs will cause integral drift to varying degrees, resulting in inaccurate spatial positioning. On the other hand, the principle itself is based on the assumptions of single-leg support and ground constraints, and the system cannot perform motion positioning calculations when both feet are off the ground. In addition, the weight of the sensors themselves and the cable connections will also impose certain constraints on the action performance. Moreover, the cost of the equipment will increase exponentially with the increase in the number of captured objects, and some sensors will also be affected by the ferromagnetic substances in the surrounding environment, which will affect the accuracy.

Optical type

The optical motion capture system is based on the principle of computer vision [2][3]. It completes the task of motion capture by having multiple high-speed cameras monitor and track the target feature points from different angles. Theoretically, for any point in space, as long as it can be seen by two cameras simultaneously, the position of this point in space at that moment can be determined. When the cameras shoot continuously at a sufficiently high rate, the motion trajectory of this point can be obtained from the image sequence.

The acquisition sensors of this kind of system are usually optical cameras. The difference lies in the different types of target sensors. One type is that no additional markers are added to the object, and the joint information extracted based on two-dimensional image features or three-dimensional shape features is used as the detection target. This kind of system can be collectively referred to as a markerless optical motion capture system. The other type is to paste marker points on the object as target sensors, and this kind of system is called a marker-based optical motion capture system.

1、Markerless optical type

There are roughly three principles of markerless optical motion capture: The first is motion capture based on ordinary video images. The coordinates of joint points in the two-dimensional image are extracted through human shape detection in the two-dimensional image, and then the three-dimensional spatial coordinates of the joints are calculated according to the three-dimensional measurement of multi-camera vision. Due to the complex and redundant information in ordinary images, this kind of calculation usually has poor robustness, slow speed, and unsatisfactory real-time performance. Moreover, the joints lack quantitative information reference, resulting in relatively large calculation errors. Currently, this type of technology is mostly in the laboratory research stage. The second is motion capture based on infrared camera images that separate the foreground and background information before and after being illuminated by an active heat source, that is, the so-called thermal energy motion capture. The principle is similar to the first one. However, after being illuminated by the heat source, the separation of the foreground and background of the image greatly improves the speed of human shape detection, as well as the robustness and calculation rate of 3D reconstruction. But since the heat source illuminates from a fixed direction, it restricts the movement direction of the human body during motion capture, making it difficult to perform 360-degree all-round motion capture. For example, actions such as turning around and pitching are not applicable. Also, it cannot break through the technical barrier of large calculation errors caused by the lack of clear joint reference information. The third is motion capture based on three-dimensional depth information. The system uses structured light coding projection to obtain the three-dimensional depth information of objects within the field of view in real time, conducts human shape detection according to the three-dimensional shape, and extracts the motion trajectory of the joints. A representative product of this type of technology is the Kinect sensor of Microsoft [5]. It has good robustness in action recognition, a high sampling rate, and a very low price. Many enthusiasts have tried to use Kinect for motion capture, but the results are not satisfactory. This is because the application positioning of Kinect is an action recognition sensor, not for precise capture. It also has problems such as large calculation errors in joint positions and cumulative deformation of hierarchical bone movements. Overall, the common problems of markerless motion capture are low motion capture accuracy. And due to the inherent limitations of the principle, there is a lack of calculation of motion degrees of freedom (such as the spin information of bones, etc.), resulting in problems such as action deformation.

2、Marker-based optical type

The marker-based optical motion capture system is generally composed of optical marker points (Markers), motion capture cameras, signal transmission devices, and a data processing workstation. The optical motion capture system commonly referred to usually means this type of marker-based motion capture system. Marker points are pasted on the key parts of the moving object (such as the joints of the human body, etc.). Multiple motion capture cameras detect the Marker points in real time from different angles. The data is transmitted to the data processing workstation in real time. According to the principle of triangulation, the spatial coordinates of the Marker points are accurately calculated, and then the 6-degree-of-freedom motion of the bones is calculated based on the principles of biomechanics.

According to the different marker point light-emitting technologies, it can be further divided into active and passive optical motion capture systems here:

（1）Active optical type

In the active optical motion capture system, the Marker points are composed of LEDs. These LEDs are pasted on various major joint parts of the human body. The LEDs are connected by cables and are powered by a power supply device strapped to the surface of the human body.

Its main advantages are that high-brightness LEDs are used as optical markers, which allows for outdoor motion capture to a certain extent. The LEDs are controlled to be bright or dim by pulse signals, and time-domain coding recognition is carried out for the LEDs based on this. It has good recognition robustness and a high tracking accuracy rate.

The disadvantages are as follows: Firstly, the principle of LED identification based on time sequence coding essentially relies on the camera to capture and image different Markers at different moments for ID identification. It is equivalent to performing successive exposures for each Marker separately in the same action frame, which destroys the synchronization of Marker detection in motion capture, leads to motion distortion, and is not conducive to the capture of fast actions. Secondly, since a large proportion of the camera's frame rate is used for the identification of different Marker points within a single frame, the effective action frame sampling rate is relatively low. In this regard, it is also not conducive to the capture of fast movements and data analysis. Thirdly, the LED Marker has a small viewing angle (about 120 degrees of the emission angle). Usually, two cameras are integrated inside a capture lens for close-range collection. This narrow baseline structure results in relatively low precision of visual three-dimensional measurement. Moreover, during the movement process, due to problems such as action occlusion, frequent data loss is still inevitably caused. If we want to avoid data loss caused by occlusion as much as possible, we need to double the number of motion capture lenses to make up for the problem of occlusion blind areas, and the equipment cost will also increase exponentially. Fourthly, due to the limitations of the principle of time sequence coding, there are strict restrictions on the total number of Markers that the system can support. On the premise of ensuring a sufficient sampling rate, the number of people collected simultaneously is generally not advisable to exceed 2. Moreover, the more Marker points there are, the longer the exposure time for each point in a single frame will be, and the more serious the motion distortion will be. (2) Passive optical type

The passive optical motion capture system, also known as the reflective optical motion capture system, has Marker points that are usually high-brightness retroreflective spheres. These spheres are pasted on the main joint parts of the human body. The LED illumination light emitted from the motion capture lens is reflected by the reflective spheres to the motion capture camera for the detection and spatial positioning of the Markers. Its main advantages are mature technology, high precision, high sampling rate, accurate motion capture, flexible and quick performance and use. Marker points can be added and arranged freely at a very low cost, and it has a wide range of applications. The main disadvantages are: Firstly, it is sensitive to sunlight within the capture field of view. The light spots formed by sunlight on the ground may be misidentified as Marker points, causing target interference. Therefore, the system generally needs to work properly in an indoor environment. Secondly, Marker point identification is prone to errors. Since the reflective Marker points do not have unique corresponding ID information, problems such as occlusion during movement are likely to cause errors in target tracking, leading to confusion of Marker point IDs. This situation usually results in poor real-time animation demonstration effects at the motion capture site, with actions prone to misalignment, and requires manual intervention for data repair during the post-processing process, greatly increasing the workload. However, the new generation of technology has implanted advanced intelligent capture technology, which has strong automatic Marker point identification and error correction capabilities, largely meeting the needs of on-site real-time animation demonstrations, and greatly reducing the workload of manual intervention, essentially further enhancing the practicality of the system. Advantages and Disadvantages of Motion Capture Technology Advantages The advantage of motion capture is that the performer has a large activity range, is not restricted by cables or mechanical devices, and is easy to use. The sampling rate is relatively high, which can meet the needs of most sports measurement. Markers are inexpensive and easy to expand. More practically speaking, it is convenient to achieve various cool special effects in movies and games. Disadvantages The system is expensive. Although it can capture real-time movements, the post-processing (including Marker identification, tracking, and calculation of spatial coordinates) takes a long time. This type of system is sensitive to the lighting and reflection conditions of the performance venue. The device calibration is also rather cumbersome, especially when the movement is complex. Markers on different parts are easily confused and occluded, resulting in incorrect results, and often require manual intervention in the post-processing process. Due to various limitations like these, almost all optical tracking systems still need to rely on subsequent processing programs to analyze, process, and organize the captured data before it can be applied to the animation character model. Main Application Fields of Motion Capture Technology Animation Production Applying motion capture technology to animation production can greatly improve the level of animation production. It greatly enhances the efficiency of animation production, reduces costs, and makes the animation production process more intuitive and the effects more vivid. Virtual Reality System

In order to achieve the interaction between humans and the virtual environment and system, it is necessary to determine the positions and orientations of the participant's head, hands, body, etc., accurately track and measure the participant's movements, and detect these movements in real time, so as to feed this data back to the display and control system. These tasks are essential for a virtual reality system, and this is exactly what the research content of motion capture technology is about.

Robot remote control

The robot transmits the information of the dangerous environment to the controller. The controller makes various movements according to the information. The motion capture system captures these movements and transmits them to the robot in real time, and then controls the robot to perform the same movements. Compared with the traditional remote control mode, this system can achieve more intuitive, meticulous, complex, flexible and rapid motion control, greatly improving the robot's ability to deal with complex situations. In the current situation where the full autonomous control of robots is not yet mature, this technology has particularly important significance.

Interactive games

Motion capture technology can be utilized to capture various actions of gamers, which are then used to drive the movements of characters in the game environment. This provides gamers with a brand-new sense of participation and enhances the realism and interactivity of the game.

Sports training

Motion capture technology can capture the movements of athletes, making it convenient for quantitative analysis. By combining the principles of human physiology and physics, methods for improvement can be studied, enabling sports training to break away from the state of relying solely on experience and enter an era of theorization and digitization. Moreover, the movements of athletes with poor performance can be captured and compared and analyzed with those of outstanding athletes, thus helping them with their training.

In addition, motion capture technology also has great potential in fields such as ergonomics research, simulation training, and biomechanics research. It can be predicted that with the development of the technology itself and the improvement of the technical level in related application fields, motion capture technology will be applied more and more widely.