Each respondent who finishes the first-wave questionnaire in Taiwan is assigned a unique serial number with seven digits. This serial number is tied to the respondent throughout all the waves. Researchers can use the serial number to merge the data of the same respondents across waves, or to merge the data of the main respondents and child respondents who are from the same households.

Numbering of Respondent’s Serial Number in Taiwan Survey

A serial number assigned for a respondent in the Taiwan survey is composed of seven digits.

The first three digits indicate the geographical code of the region where the main respondent resides at the time of the first-wave survey. The respondent might or might not reside in the same region in the subsequent waves of survey.

The fourth digit of the serial number indicates the type of the respondent. If the digit is 0, it indicates that the respondent is from the first two groups of main respondents, with their first-wave data collected in 1999 or 2000. These respondents can be further distinguished with the help of the second digit: if the second digit is 0, then the respondent is from the main sample who were first interviewed in 1999; if the second digit is larger than 0, then the respondent is from the main sample first interviewed in 2000. If the fourth digit is 1, it means the respondent belongs to the group of the main sample who were first interviewed in 2003; if the digit is 2 or 3, it means the respondent was first interviewed in 2009; if the digit is 4, it means the respondent was first interviewed in 2016.

The fifth and sixth digits are a sequence number without special meaning.

The number composed of the first six digits stands for the family serial number of the respondent. Respondents with the same family ID are from the same family. The seventh digit can be used to distinguish different members from the same family. If the respondent belongs to the main sample, the seventh digit is set as 0; if the digit is larger than 0, it means the respondent belongs to the child sample. For a child respondent, the seventh digit doesn’t represent the birth order of respondent in his or her family, but rather the order of being included as a child respondent from the same family.

The serial numbers of some respondents are less than seven digits. This is because the first one or two digits of the serial number are null values.

Numbering of Respondent’s Serial Number in China Survey

The serial number assigned to the respondents in the China survey is composed of five digits. The serial number for the same respondent is intact in different waves of surveys. Researchers can use the serial number to merge panel data for the same respondents.

Data Merging: Merging Multiple Waves of Data for the Same Respondents

In a panel study, a respondent can have multiple observations which are collected in different survey waves. Thus, panel data comprise observations of multiple respondents measured over multiple time periods. If a researcher plans to merge the survey data of the same respondents from different time periods, the researcher can use the serial number to merge data. The merged panel data can be of wide format or long format, which shall be explained in the following.

(A) Wide-format Panel Data

In the “wide format,” a respondent’s variables from different time periods are in the same row, which indicates that each row contains information for a distinct respondent.

Data users should carefully note that the variables of each wave of data are named according to the numbering of the corresponding questions in the questionnaire. Thus, two waves of survey data might contain different variables (measures) with the same variable names. When merging data across waves, we suggest the researchers first select the variables in need, then rename these variables. The year of survey should be affixed to the end of the variable name in order to distinguish data collected in different waves of surveys.

Even though the PSFD team has tried its best to keep the options of the same question intact across wave, minor adjustments are unavoidable. To deal with the adjustments of options across waves, researchers should check for changes and harmonize the classification of the selected categorical variables in order to avoid possible errors.

(B) Long-format Panel Data

In the “long format,” each row contains one wave of information for each respondent. So each respondent has one or more rows of data.

To obtain long-format data, the researchers should first harmonize the names and options of the selected variables, then append the data of various years. Before appending data, a new variable indicating the year of survey should be created to avoid confusion of the data sources.

Data Merging: Merging Main Respondents’ and Child Respondents’ Data

For the main respondents and the child respondents from the same family, their data can be identified using the family serial number (the first six digits of the respondent’s serial number).

The data-processing strategies for merging the main respondents’ and child respondents’ data are similar to the approach used in merging panel data, so the detailed procedures are not repeated here.