Description
Abstract: This replication package provides the code used to generate the figures and results in the paper, which links Canadian online job postings from Indeed to firm-level data from Advan Research using natural language processing (NLP) techniques.\n\nThe code is organized in two parts:\n1. **Data construction Scripts** (require access to confidential data and cannot be executed without the necessary data agreements, though they are included for transparency and documentation) \n- **Company name matching** using tf-idf and cosine similarity to match inconsistently-declared company names in the online job postings names in the Advan Research Points-of-Interest (POI) dataset.\n- **Occupational classification** of job titles into the Canadian National Occupation Classification (NOC) using a pre-trained classifier.\n- **Aggregation** for data to construct the figures in the paper.\n\n2. **Public Replication Scripts** (fully runnable with included grouped data) \n- **Nowcasting of official vacancies** using pseudo real-time information from online job postings and the Job Vacancies and Wage Survey (JVWS).\n- **Analysis of digital vs. non-digital jobs dynamics** in tech vs. non-tech firms during and after the COVID-19 pandemic.\n\nDue to licensing restrictions, raw data from Indeed and Advan are not included in this archive. However, we provide code to replicate the data processing pipeline (when access is granted) and make available aggregated outputs sufficient to reproduce all figures and tables in the paper.
Replication data for conference paper published in AEA Papers and Proceedings. When citing this dataset, please also cite the associated article. A sample Publication Citation is provided below.
Replication data for conference paper published in AEA Papers and Proceedings. When citing this dataset, please also cite the associated article. A sample Publication Citation is provided below.