1. Create and maintain optimal data pipeline architecture, assemble large and complex datasets to meet functional/nonfunctional business requirements.
2. Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, redesigning infrastructure for greater scalability, etc.
3. Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
4. Work with data and analytics experts to strive for greater functionality in our data systems.
5. Support cross functional, cross BU data integration tasks.
1. Experience with Computer Science relevant knowledge or degree in Computer Science, IT, or similar field; a Master's is a plus.
2. Experience with Shell and Python 3.8+ at least 2 years for 1 -2 completed project life cycle.
3. Build the infrastructure required for optimal extraction, transformation, and loading data from a wide variety of data sources using SQL and GCP ‘big data’ technologies, e.q. Airflow, Dagster, Argo Workflow.
4. Experience with streaming processing system e.g. Pub/Sub, Kafka, Spark Streaming, Apache Flink, Cloud Dataflow, Apache Beam.
5. Experience with RDBMS and NoSQL database, e.q. PostgreSQL, GCS, BigQuery, CloudSQL, Dataproc, Cassandra, Scylla, Elasticsearch, Druid, Redis.
6. Experience with backend API development, deployment, e.q. FastAPI, Django, Flask, Sanic.
7. Experience with Kubernetes, docker, e.q. build docker image.
8. Experience with GNU/Linux system and do deployment and debug under the environment.
1. Basic knowledge of machine learning algorithm.
2. Basic knowledge of Data Lake and Data Warehouse Design.
3. Basic knowledge of Data Modeling for OLTP and OLAP system.
4. Experience with completed project lifecycle.
5. Experience with the following programming languages: Go, Java, Scala.
6. Experience with first line or second line system operation and maintenance.