82 Neemtree Jobs
Data Engineer - ETL/Data Warehousing (8-13 yrs)
Neemtree
posted 3+ weeks ago
Fixed timing
Key skills for the job
Responsibilities :
- Define and lead the data architecture vision and strategy, ensuring it supports analytics, ML, and business operations at scale.
- Architect and manage cloud-native data platforms using Databricks and AWS, leveraging the lakehouse architecture to unify data engineering and ML workflows.
- Build and optimize large-scale batch and streaming pipelines using Apache Spark, Airflow, and AWS Glue, ensuring high availability and fault tolerance.
- Design and develop data marts, warehouses, and analytics-ready datasets tailored for BI, product, and data science teams.
- Implement robust ETL/ELT pipelines with a focus on reusability, modularity, and automated testing.
- Enforce and scale data governance practices, including data lineage, cataloging, access management, and compliance with security and privacy standards.
- Partner with ML Engineers and Data Scientists to build and deploy ML pipelines, leveraging Databricks MLflow, Feature Store, and MLOps practices.
- Provide architectural leadership across data modeling, data observability, pipeline monitoring, and CI/CD for data workflows.
- Evaluate emerging tools and frameworks, recommending technologies that align with platform scalability and cost-efficiency.
- Mentor data engineers and foster a culture of technical excellence, innovation, and ownership across data teams.
Requirements :
- 8+ years of hands-on experience in data engineering, with at least 4 years in a lead or architect-level role.
- Deep expertise in Apache Spark, with proven experience developing large-scale distributed data processing pipelines.
- Strong experience with Databricks platform and its internal ecosystem (e. g., Delta Lake, Unity Catalog, MLflow, Job orchestration, Workspaces, Clusters, Lakehouse architecture).
- Extensive experience with workflow orchestration using Apache Airflow.
- Proficiency in both SQL and NoSQL databases (e. g., Postgres, DynamoDB, MongoDB, Cassandra) with a deep understanding of schema design, query tuning, and data partitioning.
- Proven background in building data warehouse/data mart architectures using AWS services like Redshift, Athena, Glue, Lambda, DMS, and S3
- Strong programming and scripting ability in Python (preferred) or other AWS-compatible languages.
- Solid understanding of data modeling techniques, versioned datasets, and performance tuning strategies.
- Hands-on experience implementing data governance, lineage tracking, data cataloging, and compliance frameworks (GDPR, HIPAA, etc. ).
- Experience with real-time data streaming using tools like Kafka, Kinesis, or Flink.
- Working knowledge of MLOps tooling and workflows, including automated model deployment, monitoring, and ML pipeline orchestration.
- Familiarity with MLflow, Feature Store, and Databricks-native ML tooling is a plus.
- Strong grasp of CI/CD for data and ML pipelines, automated testing, and infrastructure-as-code (Terraform, CDK, etc. ).
- Excellent communication, leadership, and mentoring skills with a collaborative mindset and the ability to influence across functions.
Functional Areas: Software/Testing/Networking
Read full job description8-13 Yrs
Data Engineering, SQL, ETL Testing +6 more
3-5 Yrs
Automation Testing, Cypress, Software Quality Assurance +2 more
2-4 Yrs
DevOps, Python, Incident Management +4 more
3-5 Yrs
AWS, Java, MySQL +1 more
3-6 Yrs
Mean, Javascript, Mern Stack +4 more
6-10 Yrs
AWS, Mean, Mern Stack +3 more
8-12 Yrs
Canva, Linux, Embedded C +6 more
5-7 Yrs
Java, MySQL, Spring Boot +5 more
5-8 Yrs
Javascript, UI, Angularjs +2 more
1-3 Yrs
Javascript, UI, Redux