Skip to content

MexicoCensus StatVar configuration update #1311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# MexicoCensus_AA2

- source: https://data.humdata.org/dataset/cod-ps-mex/

- how to download data: Download script (mexico_download.py).
To download the data, you'll need to use the provided download script, mexico_download.py. This script will automatically create an "input_files" folder where you should place the file to be processed. The script also requires a configuration file (config.py) to function correctly. Future urls can be added in the same config.py file.

- type of place: Demographics, Administrative Area 1 and Administrative area 2 level.

- statvars: Demographics and Subnational.

- years: 2021 to 2024.

- place_resolution: Places resolved to wikidataId in place_resolver sheet separately.

### How to run:

`python3 stat_var_processor.py --existing_statvar_mcf=stat_vars.mcf --input_data='<input_file>.csv' --pv_map='data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/<filename of pvmap.csv> --places_resolved_csv='data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/<filename of places_resolved_csv.csv>' --config_file='data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/<filename of metadata.csv>' --output_path='data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/<output_folder_name>/<filename>`


#### Example
#### Download :
`python3 mexico_download.py`
Notes: Files will be downloaded inside "input_files" folder.
#### Processing
For Admistrative Area 0 (AA0):
`python3 stat_var_processor.py --input_data=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/input_files/mex_admpop_adm0_*.csv --existing_statvar_mcf=stat_vars.mcf --pv_map=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_pvmap_adm0.csv --config=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_metadata.csv --places_resolved_csv=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_places.csv --output_path='/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/output_files/mexico_output_aa0'`

For Admistrative Area 1 (AA1):
`python3 stat_var_processor.py --input_data=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/input_files/mex_admpop_adm1_*.csv --existing_statvar_mcf=stat_vars.mcf --pv_map=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_pvmap_aa1.csv --config=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa1/mexico_metadata.csv --places_resolved_csv=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_places.csv --output_path='/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/output_files/mexico_output_aa1'`

For Admistrative Area 2 (AA2):
`python3 stat_var_processor.py --input_data=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/input_files/mex_admpop_adm2_*.csv --existing_statvar_mcf=stat_vars.mcf --pv_map=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_pvmap_aa2.csv --config=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_metadata.csv --places_resolved_csv=/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/mexico_places.csv --output_path='/data/statvar_imports/mexico_subnational_population_statistics/mexico_census_aa2/output_files/mexico_output_aa2'`

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Mexico_Census_URL = [
"https://data.humdata.org/dataset/05d82fd1-c2a8-402a-86f5-745837553eed/resource/12898001-e1e3-4ae6-8bb8-946319d12b65/download/mex_admpop_2021_v2.xlsx",
"https://data.humdata.org/dataset/05d82fd1-c2a8-402a-86f5-745837553eed/resource/b07fc6f2-5637-4cee-9a80-a38154f3403f/download/mex_admpop_2024.xlsx"
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"import_specifications": [
{
"import_name": "MexicoCensus_AA2",
"curator_emails": ["support@datacommons.org"],
"provenance_url": "https://data.humdata.org/dataset/cod-ps-mex",
"provenance_description": "Mexican Demographics data from The Humanitarian Data Exchange(HDX) at Municipal Level.",
"scripts": ["mexico_download.py", "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/mex_admpop_adm0_*.csv --pv_map=mexico_pvmap_adm0.csv --config_file=mexico_metadata.csv --places_resolved_csv=mexico_places.csv --output_path=output_files/mexico_output_aa0", "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/mex_admpop_adm1_*.csv --pv_map=mexico_pvmap_aa1.csv --config_file=mexico_metadata.csv --places_resolved_csv=mexico_places.csv --output_path=output_files/mexico_output_aa1", "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/mex_admpop_adm2_*.csv --pv_map=mexico_pvmap_aa2.csv --config_file=mexico_metadata.csv --places_resolved_csv=mexico_places.csv --output_path=output_files/mexico_output_aa2"],
"source_files": [
"input_files/mex_admpop_adm0_*.csv"
],
"import_inputs": [
{
"template_mcf": "output_files/mexico_output_aa0.tmcf",
"cleaned_csv": "output_files/mexico_output_aa0.csv"
},
{
"template_mcf": "output_files/mexico_output_aa1.tmcf",
"cleaned_csv": "output_files/mexico_output_aa1.csv"
},
{
"template_mcf": "output_files/mexico_output_aa2.tmcf",
"cleaned_csv": "output_files/mexico_output_aa2.csv"
}
],
"cron_schedule": "0 07 * * 3"
}
]
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os, config
import requests, io
from absl import logging
from pathlib import Path
from retry import retry
import pandas as pd

#Read urls from Config file

Mexico_Census_URL = config.Mexico_Census_URL

#Ensure output directory exists

OUTPUT_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "input_files")
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)

#Retry function for handling request failures

@retry(tries=3, delay=5, backoff=2)
def retry_method(url, headers=None):
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
return response

#Function to download the Mexico_Census_AA2 Data

def download_and_convert_excel_to_csv():
logging.info("Starting download and conversion of Excel files...")
KEYWORDS = ["adm1", "adm2"]
try:
for url in Mexico_Census_URL:
response = retry_method(url)
excel_file = pd.ExcelFile(io.BytesIO(response.content))
for sheet_name in excel_file.sheet_names:
try:
if any(keyword in sheet_name.lower() for keyword in KEYWORDS):
df = excel_file.parse(sheet_name)
if "ISO3" in df.columns:
df = df.drop(columns=["ISO3"])
csv_filename = os.path.join(OUTPUT_DIR, f"{sheet_name}.csv")
df.to_csv(csv_filename, index=False, encoding='utf-8')
logging.info(f"Sheet '{sheet_name}' converted to: {csv_filename}")
except Exception as e:
logging.error(f"Error processing sheet '{sheet_name}' : {e}")

except requests.exceptions.RequestException as e:
logging.fatal(f"Failed to download Mexico Census data file: {e}")
return None

if __name__ == "__main__":
download_and_convert_excel_to_csv()
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
parameter,value
url,https://data.humdata.org/dataset/05d82fd1-c2a8-402a-86f5-745837553eed/resource/18fab7ed-b244-4245-95b6-9734afa98714/download/mex_admpop_adm2_2021_v2.csv
description,Total Population of Mexico
#place_type,AdministrativeArea2
#places_within,country/MEX
start_date,2021
end_date,2024
release_frequency,1Year
process,
comments,
output_columns,"observationAbout, observationDate, value, variableMeasured,unit, scalingFactor"
Loading