Job description
Join the Lumen Astronomy Institute in Tucson, AZ, as a forward-thinking Astronomy Data Scientist. You will transform complex telescope and survey datasets into actionable scientific insights, build scalable data pipelines, and collaborate with astronomers to push the boundaries of our understanding of the night sky.
We seek a curious, results-driven professional with a passion for open science and rigorous reproducibility. This full-time role combines data engineering, scientific analysis, and cross-disciplinary collaboration to accelerate discovery.
What you’ll do is turn raw data into robust analyses, develop ML models for source extraction and classification, and help the team communicate findings to researchers and the public alike.
Responsibility
- Design and implement end-to-end data processing pipelines for astronomical imaging data from ground- and space-based observatories.
- Develop and apply machine learning algorithms for source extraction, classification, and anomaly detection in large sky surveys.
- Collaborate with astronomers to translate scientific questions into reproducible analyses and share results in publications and reviews.
- Maintain and optimize HPC and cloud-based workflows, ensuring data provenance, quality control, and scalable storage.
- Contribute to open science initiatives by documenting code, workflows, and datasets with version control and transparent reporting.
- Present findings to scientific teams and engage in outreach activities to communicate discoveries to the public.
- Mentor junior team members and contribute to grant writing and project planning.
Qualification
- PhD or MS in astronomy, astrophysics, data science, or a closely related field; strong scientific background with published work is preferred.
- 3+ years of professional experience in astronomical data analysis and scientific computing.
- Proficiency with Python (NumPy, SciPy, pandas), Astropy, and data visualization libraries; experience with CASA and DS9 is a plus.
- Strong SQL/Database skills and experience with large-scale data management and query optimization.
- Experience with Linux environments, HPC clusters, and workflow tools (Snakemake, Nextflow) and version control (Git).
- Excellent written and verbal communication skills; ability to explain complex results to both scientific and non-technical audiences.