At the moment, AI/ML/DL are hot keywords in the trend of Software development. The world have more successful projects based on AI technologies such as Google Translate, AWS Alexa, …AI makes machine smarter than. So, the way from idea to successfully have many challenges if want to make great solutions. I have some time working with AI projects and start-ups to build great solutions based on Algorithms and ML; I aimed to propose and implement solutions that help the development team working smoothly. Today, I would like to describe the development process, architecture, CI/CD, and Programming for quickly implement multiple AI approaches with Agile software development methodology.

Sessions:
Architecture
Continues Integration and Continues Deployment
– Batch Processing, Parallel Processing
– Data-Driven Development and Test Driven Development (to be continued)

Architecture

Architecture
Overal system integration

AI project including multiple services with domains focus on: AI/ML/DL, and engineering that develop independent, integration and verification automatically. Popular, the ML services very specially with Engineering Service, resolve challenges problems linking with technologies: Machine learning, deep learning, big data, distributed computing … Microservices architecture, in this case, is a first choice, that helps to separate businesses problem into specific services and can be resolve by specific domain knowledge of Data Science team, Engineering team. And more advantage of microservices with Agile development, more information here. With the AI project, there will focus more on “How to resolve business by AI technology?”.

Microservices maybe not the best choice but that help to quickly development and delivery with Agile methodology.

Continues Integration and Continues Deployment

When a project including multiple teams, multiple services challenges at the integration and deployment. CI/CD is most popular with software development but I got more specific from Data Science(DS) team. The big question of DS is “We have more solutions to resolve this problem, Could you help me propose a solution to quickly evaluation and integration?

With the Engineering team, CI/CD pipeline is so general. With AI solution, you will meet some challenges linking to:
– How to running on distributed computing? We choose batch jobs
– How to save money with long-time jobs? We choose AWS spot instances
– How do parallel jobs improve performance? We running parallel jobs and parallel on structure design(Python coding)
– How to control Data versions, Model versions? we choose Data Version Control and AWS S3 to versioning training/evaluation data and models

All solutions applied to my project aimed to resolve challenges of AI technology, but it interesting. A good abstract of structure will help to quickly integrate and deliver multiple approaches.

Pipe Line
CI/CD PipeLine

This pipeline can implement with any CI/CD framework such as Gitlab CI, Jenkins, AWS Code Build … So, each framework should have a function for custom distributed and parallel jobs. Because the jobs in the pipeline need specific resources and the resource should be auto-scale. Example for Training Jobs need more GPUs and System Evaluation need more CPUs for parallels, a scalable resource is most important to save the cost.

CI/CD pipeline including training and system will help fast try and fast result, the implementation can easy to integrate quickly, trust and ablility to control quality.

Batch Processing, Parallel Processing

  • Batch Processing: allows users to submit series of programs (jobs) and they will be executed to completion without further user input and manual intervention. Its popular usage to separate large work processes into sub small jobs such as build reports, big data processing. And the batch jobs can parallel running with distributed computing and implementation at architecture.
  • Parallel Processing: is the processing of program instructions by dividing them among multiple processors to use the CPU power of a computer to run a program in less time and implement programming.

Batch and parallel are popular solutions to use the power of a computer to improve the performance of a program. So, according to the requirement and application of the solution that we can choose the fit solution. Batch processing stronger than parallel processing at distributed computing. Parallel processing optimal on single machine and batch processing usage for larger workload with distributed computing.

You can go to Apache Spark, Hadoop or AWS Map Reduce to learning more about batch processing and optimization. With CI/CD, the runner machines can host with multiple mechanisms belong specific jobs. I like to use Gitlab CI with custom runners. I can host specific jobs to specific machines to separate the major workload. More information about Gitlab Runner auto-scaling with AWS Fargate.

Thanks for reading,
Please give me comments if you have any ideas and suggestions. We hope to learn more from you.

References:
– Microservice architecture document: https://microservices.io/
– AWS Microservice deployment with Fargate: https://d1.awsstatic.com/whitepapers/microservices-on-aws.pdf

Hiring Data Scientist / Engineer

We are looking for Data Scientist and Engineer.
Check our Career Page.

Data Science Project

You can check our experiences for Data Science Project

Vietnam AI / Data Science Lab

vietnam ai lab

Visit our page Vietnam AI Lab

--

--