Conversations with Young Lives' Data Managers: Part Three

Ahead of the publication of our new report on data management, we sat down to speak with some of Young Lives’ Data Managers, past and present, about their experiences of the role.

This is the third and final conversation in the series, with Anne Yates Solon  (International Data Manager 2007-2018),  Monica Lizama  (Peru’s Data Manager), Tien Nguyen  (Vietnam’s Data Manager), Shyam Sunder   (India’s Data Manager) and Hazel Ashurst  (Data Coordinator, Oxford 2011-2013).

Catch up here with the first and second parts in this series.


Before the severity of the COVID19 pandemic hit, Young Lives was setting up for our first round of ‘Young Lives at Work’ following our young people as they enter their early to mid-twenties. Round 6 and 7 have since been preceded by our phone survey studying the challenges the young people face due to the coronavirus and the ensuing lockdown.

These interviews took place before we had arranged our phone survey.


Looking forward to Young Lives Round 6 and Round 7 surveys what do you see as the main challenges? What are the opportunities?

Anne Yates Solon: I think one of the main considerations will be attrition. We haven’t been able to undertake a typical tracking round because of how the funding and timing worked out. I’d be slightly concerned that our attrition figures will be higher, especially as our cohorts age and are leaving their household settings. That’s my main concern.

Tien Nguyen: Yes, it will be challenging to find the children. Now they’re older and move to find work. They go to different places and don’t stay in their home village. We have to spend a lot of time to try and find them.

Shyam Sunder: The main challenge for Round 6 is that our sample respondents have become adults and getting information from them and finding time to meet with them will be a challenging task at the field level. A lot of travel will also have to be made to contact respondents.

Monica Lizama: I think a challenge will be that many young people have completed their studies and then they move. They’re looking for new jobs or starting their own families. Their life is changing as they grow up and in general, they are going new places. Another challenge is that many of them don’t have much time. Now they’re people with jobs and families, often in urban areas; very few of them stay in the rural areas. They migrate to cities, or the capital, and in cities the jobs are sometimes from 8am to 9pm. So, they get home and they’re very tired, and the enumerator needs to work with their schedule by interviewing very early in the morning, very late at night or on the weekend. Sometimes it takes a lot of effort to convince participants to continue with Young Lives under these circumstances. In our tracking phone calls, some young people say “oh, another round? I have to answer the survey again?”, but most are actually content to continue with the study.

Hazel Ashurst: In terms of data management, we won’t need to change much for the next survey. I think the tablets and SurveyBe[1]  were successful. It is labour intensive, processing the data, but that’s unavoidable.


What was your experience with field management and piloting?

Anne: We piloted in the field. Early on I used to go to all the countries for the pilot process. There were two types of piloting that happened after the implementation of Computer Assisted Personal Interviewing (CAPI) firstly, for the content of the questionnaires to make sure we were asking the correct and appropriate questions, and secondly there was piloting of the CAPI program to check skips and translations etc.

Then there were debriefs, and we would go over the changes and approve the addition or dropping of questions, and then adjust the CAPI program accordingly. Also, before piloting, we would meet with the fieldworkers to go over the questionnaire section by section and the fieldworkers would get adamant about questions, in terms of what we could or couldn’t ask.

It was important to take their opinions on board, since those were the people in the field chatting to the participants. It always gave me faith in our field teams, because they were so good at feeding back context specific information.

Tien: I went to the field and sometimes attended the interviews to make sure the fieldworkers followed the instructions correctly.


What was the process of archiving like? Did you ever have to re-archive data?

Anne: We were obligated to publicly archive data. You can clean data forever. Data will never be 100% clean. We had to determine a cut-off point for when it was clean enough. Once we reached that point, all data was then anonymized and all of the documentation around samples, questionnaire designs, variables, any reference documents related to the data, was then archived with the UK data archive. They would review the data, highlighting any queries, and we’d then figure out if we wanted to drop variables etc. Then it would go live.

We absolutely had to re-archive. It goes back to the mistake I mentioned earlier, realising, for example, when a person wasn’t deceased. I would keep track of what data was submitted when, and I would adjust the data for that round. But I would always hold off… for example, when I was ready to archive Round 3, I would re-archive any data from Round 1 and 2 that I needed to archive... I made sure I did it alongside the main archiving.

Hazel: Archiving was done mainly by Anne. She went to a local archive in Essex and would send anonymized data. She would have to sign off the data set to ensure it was closed, clean, and ready for archive. She would then include a data dictionary.

Anne also worked hard on the panel data set, which was a subset.


Can you explain the concept of data linking? E.g. Instruments and constructs that are linked across the rounds—different questions at different ages.

Anne: I remember with the implementation of the school survey, they were going to Young Lives' schools. Historically we would ask the name of the school in the household survey, but that would have been a string variable [2] that I would have removed. But when we received the school survey data, we coded the schools and those then needed to be linked up to the Young Lives’ kids that went to those schools. So, we had to go back in and code all the school names in the data! Looking back, I would say for a longitudinal process you should consider coding multiple aspects because you don’t know what’s going to be asked in the future.

That’s been another good challenge for Young Lives. Things changed as we went on, such as adding a school survey we hadn’t planned for. If it had been planned for, you might have set things up differently at the beginning.

Over a project of x amount of years, things are going to change that are going to make you have to retrofit previous work. I had to go back and get all the names of all the schools, and all the names of all the kids. Then I had to go back and try to figure it all out…people would enter the data differently. They wouldn’t all spell the school names correctly. So, I had to build a penultimate list of schools. Then I had to go to the country manager, and they had to check to make sure “St. Teresa’s” wasn’t the same as “St. Mary Teresa’s.” Then we would code those; we had to write a code that would find those words in the dataset and code them, but then those words weren’t always spelled correctly.


What have you learned /what has the organisation learned over the years?

Anne: I’ve learned a lot, especially when it comes to a longitudinal process. There are certain things you can do when you’re setting up a longitudinal research project in Round 1 that we know to do now, because we’ve had to go back and correct it.  Things become apparent across time. You can be sitting in Round 3 thinking “why didn’t we do that in Round 1? It seems so obvious!” For instance, if we asked the same question across several rounds, we didn’t have a system for the variable name staying the same across six rounds, with a round identifier. So, I think from Round 3 onwards we have a round identifier such as “R3, R4, R5” built into the variable names, which should have been done from Round 1.

Monica: There are few studies in Peru like Niños del Milenio (Young Lives' Peru) that are longitudinal. We have all learned lessons along the way. I know a lot more about the field now than when I started. Each village or family may be distinct, but we want the survey to be distributed in the same way, so that we can get the most accurate data possible. We train our enumerators before they go into the field. However, when people go to the field they will encounter many different situations. I get many calls from the field with interviewers asking me how to respond to various situations. I try to make sure my responses are uniform, so that everyone has the same information and the survey is conducted uniformly. I am not the field coordinator, my colleague Sofia is, but I am very involved in this process. I know the enumerators and I participate in the trainings when we begin rounds of data collection.  In addition, training in CAPI is the most important thing.


In your experience, what are the factors that are important to successful long-term international research collaborations?

Tien: I think for me, the people involved in the project are important. We must be able to work together. It’s better to work together when we know each other very well. If we change the staff, we have to take time to get to know them, and how to work with them, and they have to take to time to learn about the project.

Anne: The countries where we were able to keep the data manager consistent and onboard were the easiest to work with. The data was the cleanest. It’s important, if you can, to maintain consistent staff and invest in their capacity. We kept consistent staff, and made sure they were happy and trained, and were supported with everything they needed to get their job done well.

Monica: Communication is key. We are always in communication with each other and know each other well after years of working together. For me what was very important was in 2012, after Round 3, the data managers in all the study countries went to Oxford for training about CAPI and it helped us get to know each other personally. We always stayed in touch after that. We helped each other with any challenges that came up, especially in terms of programming CAPI. We relied on each other. Of course, some people left, but many others stayed, and we have developed successful working relationships based on personal connections and communicating a lot.

Hazel: We had excellent relationships with the data managers in all four countries, which was really helpful. Before, during and after data collection, we were in touch with them. I also met data managers in person when they came to Oxford for training sessions in CAPI. Each data manager had different skills and personalities, but they were all great to work with.

The Research Assistants (RAs) were also all really good and hardworking, and they drove the process! Close collaboration between RAs and data management processes was really valuable. We just felt that we were all working together on a valuable project. We were all on the same team. Not only did we work across countries, but we also worked across an interesting point in time in terms of technology. I remember my time at Young Lives fondly, as everyone was so nice, and we had such good relationships with everyone.



[1] Surveybe is a data collection and management software that utilizes computer-assisted personal interviewing (CAPI).

[2] A string variable is a variable that contains not just numbers, but other characters such as letters and punctuation.

Young Lives School Surveys 2016–17: The Development of Non-Cognitive Instruments in Ethiopia, India and Vietnam

Technical notes

This technical note summarises the procedures involved in the selection, adaptation and administration of the non-cognitive scales administered in the 2016-17 Young Lives school surveys in Ethiopia, India and Vietnam. It includes a discussion of the rationale for the inclusion of these scales, along with details of the process of developing, piloting and selecting the measures used in the surveys. 

Equating Test Scores for Receptive Vocabulary Across Rounds and Cohorts in Ethiopia, India & Vietnam

Survey design and sampling
Technical notes

In longitudinal studies such as Young Lives, getting comparable measures of children´s cognitive abilities over time is essential for identifying individual, household, and school-level factors that affect children´s development. Few longitudinal studies that follow birth/age cohorts include comparable cognitive measures across waves, and those studies that are available are mainly from developed countries. Young Lives provides a unique opportunity to explore the development of value added or growth curve modelling analysis aimed at identifying variables at different levels and across time and space, associated with children’s learning outcomes, in developing countries. 

This Technical Note discusses the construction of cognitive scores that are comparable across rounds and age cohorts for Young Lives in Ethiopia, India (the states of Andhra Pradesh and Telangana) and Vietnam.

Young Lives gathers information from children and their families through individual and household questionnaires, including different cognitive and achievement tests. The Peabody Picture Vocabulary Test (PPVT) is the one test that is common across rounds and cohorts. Therefore, this test was selected to build cognitive measures comparable across Rounds 2, 3 and 4, and age cohorts (the Younger Cohort, born in 2001/02, and Older Cohort, born in 1994/95) employing Item Response Theory (IRT) to achieve standardised cognitive measures. Scores were estimated using a three-parameter model which considers the item’s difficulty, discrimination, and pseudo-guessing as parameters to estimate the individual’s ability. The second step was to perform a Differential Item Functioning analysis (DIF) by cohort and survey round in order to identify possible item bias and correct it. The last step consisted of equating the scores of common items (anchor items) as a means of obtaining comparable PPVT scores across rounds and cohorts without cohort and round biases.

New Ethiopian Centre for Child Research launched

Submitted by remote on Mon, 05/29/2017 - 15:17

Young Lives partner, The Ethiopian Development Research Institute (EDRI) has established the Ethiopian Centre for Child Research (ECCR) in partnership with UNICEF Ethiopia and Addis Ababa University. The ECCR is inspired by the collaborative work of EDRI and Young Lives in Ethiopia as well as the multi-agency Child Research and Practice Forum (CRPF).

The Design of the 2016-17 Young Lives School Survey in Ethiopia

School effectiveness
School systems (incl private schooling)
Survey design and sampling
Technical notes

Young Lives school surveys gather detailed information about children, their households, their teachers and their schools. School surveys seek to develop understanding of the contribution of educational experience in relation to the causes and consequences of childhood poverty.

The first Ethiopia school survey, conducted in 2009/10, tracked Young Lives’ ‘Younger Cohort’ children into schools and classrooms to understand their educational experiences, attainment and achievement levels (Young Lives 2012).

A second school survey, in 2012/13, was structured to collect data relating to all Young Lives children and their peers studying in Grades 4 and 5 in every school within Young Lives’ 20 sites (in Amhara, Oromia, SNNP, Tigray and Addis Ababa) and in an additional 10 sites in Afar and Somali. This research design extended the survey’s reach, in order to generate rich evidence about school and classroom effectiveness and the drivers of learning (Young Lives 2014).

The third Ethiopia school survey, being delivered in 2016/17 and the focus of this design note, will follow the research design adopted in 2012/13. Young Lives will visit the same sites and, within these, the same schools and will maintain our interest in school effectiveness, the levels, changes and drivers of learning. The team will survey students in Grades 7 and 8: the final grades of primary schooling and a crucial juncture before students proceed to general secondary education.

Priority areas for upper primary and lower secondary education policy have been identified through consultation with the Government of Ethiopia’s Ministry of Education and with national and international education stakeholders. These guide our main research questions:

  • At what level are students performing in core curricular and transferable domains (Mathematics and Functional English) and are levels indicative of preparedness for further education and training?
  • How much progress are children making in one academic year and what are the drivers of learning trajectories over time, including how these relate to equity (e.g. are gaps growing or shrinking)?
  • What is the role of key dimensions of education quality in shaping educational outcomes over time and, in particular, which teacher practices are associated with improved learning outcomes?
  • What are the relationships between language of instruction (intended and applied), participation, learning levels and preparedness for further education and training in secondary grades?

This design note outlines the context and policy background, the research design, and the policy implications of the third Ethiopia school survey.

Young Lives panel data released

Submitted by remote on Mon, 06/13/2016 - 18:15

A panel dataset of constructed variables from Young Lives is now available to access from the UK Data Service. The dataset has been compiled to facilitate analysis of the household and child survey data across the four rounds of data collected to date. The files are combined sub-sets of selected variables from the Young Lives survey carried out in 2002 (Round 1), 2006 (Round 2), 2009 (Round 3), and 2013 (Round 4), when the Younger Cohort children were aged 1, 5, 8, and 12 years and the Older Cohort children were aged 8, 12, 15, and 19 years.

Young Lives Rounds 1 to 4 Constructed Files

Technical notes

This Technical Note accompanies the Constructed Files of Young Lives data which have been deposited with the UK Data Service to facilitate analysis of the household and child survey data across the four rounds of data collected to date. The constructed files are combined sub-sets of selected variables from Rounds 1 to 4 of the Young Lives survey, carried out in 2002 (Round 1), 2006 (Round 2), 2009 (Round 3), and 2013 (Round 4), when the Younger Cohort children were aged 1, 5, 8, and 12 years and the Older Cohort children were aged 8, 12, 15, and 19 years.

The files contain about 200 original and constructed variables, most of them comparable across the four rounds, presented in a panel format and classified in four broad groups: panel information, general characteristics, household characteristics, and child characteristics. This document is organised around the same groups.


The Reliability and Validity of Achievement Tests in the Young Lives Ethiopia School Survey Round 2

Juan Leon
Technical notes

This technical note gives details of the reliability and validity of the assessments used in the second school survey carried out by Young Lives in Ethiopia for the purpose of the construction of test scores on a common scale within each language for maths and reading comprehension. This document give details of the three-parameter model used to build the achievement scores in both content areas. We tested graphically for item fit and item bias (by gender and wave). Our results indicate that most of the items used have a good item fit as well as they did not show the presence of bias by wave or gender. Finally, we did an external validity analysis correlating the IRT scores (maths and reading comprehension) with individual and family characteristics, and the results showed that correlations were statistically significant with the expected signs.