71 Dash Hire Jobs
AI DevOps Engineer (2-5 yrs)
Dash Hire
posted 3+ weeks ago
Key skills for the job
About the Role :
We are looking for an experienced AI DevOps Engineer to join our innovative team. In this role, you will be responsible for managing and optimizing AI/ML workflows across cloud platforms, ensuring the seamless deployment, versioning, and scaling of AI models. You will work with both structured and unstructured data, implement multi-tenancy strategies, and automate processes to enhance the operational efficiency of AI solutions. The ideal candidate will have a strong background in cloud-based AI services, containerization, MLOps practices, and robust data management strategies.
Key Responsibilities :
- Work with a variety of structured and unstructured datasets across multiple storage formats, including relational databases (RDBMS), document databases (DocumentDBs), and graph databases (e.g., Neo4j,Cypher). Optimize data flow and storage strategies to ensure seamless integration with AI models.
- Write and optimize GraphQL queries for efficient data access and manipulation across knowledge graphs and AI-driven modules. Ensure the design of efficient and scalable data retrieval methods for complex graph structures.
- Manage AI/ML environments across leading cloud platforms such as Google Cloud Vertex AI, AWS Sagemaker, and Azure ML. Ensure seamless integration of cloud-based AI services with existing infrastructure.
- Automate the deployment, versioning, and operationalization of AI models across multiple environments (development, staging, production) on Fermis unified platform.
- Implement CI/CD pipelines tailored to AI/ML workflows.
- Support the full model lifecycle management, including downloading, compressing, transporting, reusing, and monitoring AI models across different environments. Ensure proper versioning and rollback strategies for model deployment.
- Handle the deployment of Generative AI (GenAI) models, including models stored in transformer (TRF) or compressed formats. Ensure that GenAI models are optimized for scalability and performance.
- Implement and manage multi-tenancy strategies for scalable AI model deployments, enabling efficient AI driven services for diverse client needs and internal use cases. Ensure the isolation and security of models for different tenants.
- Implement Infrastructure as Code (IaC) practices using tools like Terraform, Ansible, or CloudFormation to automate environment provisioning and infrastructure management for AI workloads.
- Ensure the security of AI models and data across all stages of deployment. Implement enterprise-level data security, access controls, and model version management practices. Comply with industry standards and best practices for cloud security and regulatory requirements.
- Set up and configure monitoring systems to track model performance and health, including detecting model drift. Continuously improve the performance, reliability, and scalability of deployed models.
- Collaborate with AI researchers, data engineers, and software engineers to implement AI solutions effectively.
- Maintain detailed documentation of deployment pipelines, workflows, and processes to ensure scalability and knowledge sharing across teams.
Required Skills & Qualifications :
- 2+ years of professional experience in DevOps or AI/ML operations, with hands-on experience managing and deploying AI/ML solutions in cloud environments.
- Strong experience with cloud-based AI/ML services, specifically Google Cloud Vertex AI, AWS Sagemaker, and Azure ML.
- Expertise in containerization technologies such as Docker and Kubernetes for AI model hosting and deployment.
- Solid experience with MLOps best practices, including CI/CD for models, model registries, monitoring, and drift detection.
- Knowledge of GraphQL and proficiency in writing optimized queries for data manipulation across knowledge graphs and AI models.
- Experience with multi-tenancy strategies for scalable AI model deployments and serving AI models to diverse clients and internal use cases.
- Ability to work with structured and unstructured data from various data stores, including RDBMS, NoSQL, and graph databases (e.g., Neo4j).
- Familiarity with enterprise-level data security protocols, access controls, and model version management strategies.
- Proficiency in Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation.
- Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, or cloud-native monitoring solutions.
- Experience in implementing deployment pipelines for GenAI applications with a focus on automation, scalability, and efficiency.
Functional Areas: Software/Testing/Networking
Read full job description2-5 Yrs
DevOps, Cloud Computing, Artificial Intelligence +5 more
3-5 Yrs
UI and UX, Javascript, TypeScript +1 more
4-7 Yrs
Python, Java, Golang +2 more
5-8 Yrs
Python, AWS, Cloud Services +7 more
2-4 Yrs
UI and UX, Figma, UX Research +2 more
11-16 Yrs
Data Analytics, Python, Artificial Intelligence +4 more
2-5 Yrs
Data Analytics, Artificial Intelligence, Machine Learning +4 more
4-5 Yrs
Data Analytics, SQL, Clinical Data Management +2 more
4-5 Yrs
Key Account Management, Client Management, Client Engagement