NEPScribe Beta

Version: v0.2.0

Features

Dataset Transformation (SC3-SC6): Dynamically create a Stata or R script for person-year data preparation. It merges multiple NEPS SUF data files and transforms into a person-year format, with one row for each wave of each respondent.
- Choose between using the spellfiles or the biography file as the baseline for data preparation.
- Select variables from most datasets and easily include them in the script.
- Add sample code for complex data preparation tasks, such as further training, highest educational degree, or children.
- Obtain a script that handles most of the complex restructuring and merging of the data.
- However, careful review of the script and additional data preparation remain necessary.

Dataset Exploration (SC1-SC8): Browse available meta data in NEPS SUF data to get an overview of datasets and variables.
- Search for keywords in specific or all datasets.
- Compare items and variables across starting cohorts.
- Check what meta data is available for which variables.

Note

The app is based on NEPS semantic structure files, which are identical to NEPS SUF data files but have had all observations removed and are therefore publicly available.

You may change the sidebar width in the sidebar to be able to read long variable names on smaller screens.

If you find any issues or bugs in the app or in the generated scripts, please report them to alexander.helbig@wzb.eu or open an issue on the app's github page (See help tab in the navbar).

Transform Data

What is this?

This feature will allow you to create dynamic scripts that will transform NEPS SUF data into a person-year format, where each row in the dataset corresponds to one year in a respondent's life and has a unique life-course status (e.g. employment or vocational training).

This data structure is essential for various research projects, but implementing it can be challenging without prior experience.

The script with basic settings will create a person-year-dataset that only contains very basic spell-related information.

You can customize numerous settings to align with the requirements of your research project and enrich the resulting dataset with variables from all datasets.

Important Note:
This feature applies only to Starting Cohorts 3 to 6. However, it has so far only been extensively tested for SC6. Scripts for SC3, SC4 and SC5 may need substantial adjustments.

What else to consider?

The generated script is intended as a template and should not be used without review. We highly recommend thoroughly understanding each data preparation step outlined in the script. There may be alternative or more effective methods for preparing a person-year dataset that better align with your specific analytical needs.

Additionally, the generated person-year dataset will likely require further variable preparation.

Generally, it is advisable to review the script line by line to identify and correct any potential errors.

How to use?

Format

First, you will need to choose between two format options in the sidebar:

Harmonized Spell Format: This option prepares data based on the edited and cleaned biography file to represent life-course trajectories. It is recommended to use this format.
Original Subspell Format: This option uses the originally recorded subspell episodes. Please note that data cleaning and smoothing operations performed in the biography are not accounted for in this preparation.

Starting cohort

Now, you need to choose a starting cohort.

Script Type

Next, you can select whether you want an R or STATA script.

Settings

Additionally, you may choose options to handle missing values, switch to English labels or add parallel spells information.

Add exemplary data preparation

On top of that, you might add exemplary data preparation of variables from modules that cannot be directly merged with the person-year dataset.

Variable labels language

You may also switch the language of the variable labels in the script.

Spell Prioritization

Furthermore, you can alter the spell prioritization order using the second tab in the top right corner.

Additional Variables

In the third tab, you can add additional variables from various NEPS datasets to the script. Some datasets (e.g. further training related datasets) are excluded though due to additional data preparation steps required for including variables from these datsets.

Preview and Download

Lastly, you can preview and download the script incorporating all your adjusted settings.

Optional: Provide Datapath

If you wish, you can provide the local file path to the NEPS SUF data (in .dta format) so it is immediately available in your script. Alternatively, you can insert the path manually.

To construct a person-year dataset where each row corresponds to one wave for each individual, a spell prioritization process is useful for identifying the principal spell in cases where multiple spells occur simultaneously. The following hierarchy of spell types dictates which episodes take precedence in this process, with the items at the top representing the highest priority and those at the bottom indicating the lowest priority.

1. Step: Select Dataset

3. Step: Confirm selected variables

2. Step: Select Variables

Inspect selected Variables or reset everything (Button)

Explore Datasets

Variable Table

Loading
Please wait