Scientific Data Engineer

Dana-Farber Cancer Institute

Boston, MA

Job posting number: #7298690 (Ref:df42851)

Posted: January 16, 2025

Job Description

Located in Boston and the surrounding communities, Dana-Farber Cancer Institute brings together world-renowned clinicians, innovative researchers and dedicated professionals, allies in the common mission of conquering cancer, HIV/AIDS and related diseases. Combining extremely talented people with the best technologies in a genuinely positive environment, we provide compassionate and comprehensive care to patients of all ages; we conduct research that advances treatment; we educate tomorrow's physician/researchers; we reach out to underserved members of our community; and we work with amazing partners, including other Harvard Medical School-affiliated hospitals.

POSITION SUMMARY

The Targeted Protein Degradation (TPD) proteomics core at Dana-Farber Cancer Institute is a research core focused on the development and application of state-of-the-art mass spectrometry-based discovery pipelines for targeted protein degradation and molecular glues. The TPD proteomics core is closely affiliated with the Fischer lab and the chemical biology program and serves Dana-Farber researchers and discovery partnerships.

Â

The Scientific Data Engineer works closely with the proteomics group to apply computational algorithms to integrate and analyze proteomics data with other large scale biological profiling data sources. This position involves the development of an integrated data framework, data visualization tools and an interactive web-based user interface for efficiently navigating large amounts of cellular proteomics profiles under various perturbations including chemical perturbation to unveil insights regarding biomolecular interactions.

  • PRIMARY DUTIES AND RESPONSIBILITIES

    • Deploys and maintains commercial and open-source software for efficient data processing of high throughput chemo-proteomics experiments
    • Collaborates with cross-functional teams to design and implement high throughput proteomics data analysis algorithms
    • Creates customized analysis and visualization tools to streamline proteomics data integration with high throughput biochemical and cellular screenings
    • Develops and maintains interactive web visualization interfaces for these databases.
    • Develops and maintains databases for the storage and efficient retrieval of research data, ensuring the highest levels of data integrity.
    • Works with lab personnel to implement automated processing solutions for repeated manual tasks.
    • Rigorously follows best practices for software development
    • Engages in continuous learning to stay updated with the latest advancements in artificial intelligence for small molecule drug discovery.

    Â

    KNOWLEDGE, SKILLS, AND ABILITIES REQUIRED

    • Demonstrated experience developing documented ETL pipelines
    • Knowledge of professional software engineering practices, including coding standards, code reviews, source control management, build processes, testing, and devops
    • Able to deploy data pipelines and infrastructure for on-premise and/or cloud-native infrastructure
    • Strong organizational skills with the ability to prioritize and manage various tasks and projects reliably and in a timely manner
    • Requires minimal direction from leadership and possesses the ability to adapt to new challenges as they arise
    • Excellent interpersonal skills, passionate about innovative solutions
    • Self-motivated with the ability to produce clear documentation and generate results with a clear and professional presentation
    • Detail-oriented with excellent communication skills
    • Able to identify, communicate, and advocate for best practices
    • Can work independently and efficiently to meet aggressive timelines where needed

    Â

  • Bachelor’s degree in data engineering, computer science, biomedical informatics, health services research, epidemiology, biostatistics, public health, or a subject area with a strong focus on the management and research use of clinical data. A Master’s degree may substitute for 2 years of experience.
  • 4 years of hands-on programming and data analysis experience is required, of which at least 2 years must involve hands-on experience in multi-omics data analysis and database management with expertise in common bioinformatics algorithms, Python, and shell scripting.
  • Solid understanding of database management, server-client program development, and web visualization technologies. Strong skills in SQL on a database such as Snowflake, SQL Server, Oracle, etc.
  • Proficiency with Python, R, Scala, JavaScript or another modern programming language used in data engineering
  • Strong skills in setting up and maintaining relational databases.
  • Understanding of small molecule protein interactions in biological systems
  • Excellent teamwork and communication skills.

Dana-Farber Cancer Institute is an equal opportunity employer and affirms the right of every qualified applicant to receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, national origin, sexual orientation, genetic information, disability, age, ancestry, military service, protected veteran status, or other characteristics protected by law.

Â

EEOC Poster





Apply Now

Please mention to the employer that you saw this ad on STEMCareers.com

Job posting number:#7298690 (Ref:df42851)
Application Deadline:Open Until Filled
Employer Location:Dana-Farber Cancer Institute
Needham,Massachusetts
United States
More jobs from this employer