AWS DATA ENGINEER
TITLE: AWS DATA ENGINEER (SR)
LOCATION: 1000 Brussel, Belgium
DURATION: 11 months
In collaboration with HFB and VEA, the VEB offers an application that collects data from all buildings and infrastructure used by the public sector in Flanders.
Research shows that mere insight into energy consumption can already lead to a 3% decrease. TERRA therefore primarily wants to present to organizations and support services a report of the evolution consumption and to offer tools for benchmarking.
But more effort is needed to achieve the climate targets. TERRA is therefore growing into a central policy-supporting platform that visualises savings measures, subsidies and funds, monitoring tools and the 2030 goals. Energetic master plans and OEPC processes are facilitated in this way.
TERRA is crucial to the objective of the VEB to unburden the public sector and to guarantee the efficient use of investment resources.
Finally, TERRA must also become an open data portal for energy-related data.
In accordance with the objectives of the Flemish government, semantic and organized standards are provided that enable reuse of data by internal and external parties.
The Terra platform is set up on Amazon Web Services and uses various PAAS and IAAS solutions. The data lake / data warehouse that is used for all kinds of reporting and analysis purposes makes use of the following Amazon Webservices:
– Data pipelines and EMR for loading and processing data.
– S3 for storing files
– Amazon Redshift (DWH)
– Tableau as reporting environment
– Athena for querying flat files in the data lake
– SQL server as backend database of the Terra web application
– Lambda for running backend services
– DMS for synchronization of data eg from sql server to redshift
For example, for processing the consumption data that is supplied on a daily basis, Python and Spark are used as an ETL process to merge the new consumption data in the Redshift DWH with existing data.
In addition, links are made with numerous other data sources to enrich the data warehouse.
GIT is used to version control the code and the release process (dev, stg, prod) uses MS Devops. The latter is also used for following up sprints, user stories, tasks, …
Main responsibilities :
– You manage the existing stack of data flows and pipelines in the AWS cloud environment that uses S3, EMR, Spark, Redshift, Lambda, Python, among others.
– You will develop (further) complex data models to generate further analytical insight.
– You write high-quality code to further develop the data platform. The platform is scalable and easy to maintain.
– You will link / integrate new data sources from various origins into the existing solution.
– You analyze existing processes and advise where performance gains and cost savings can be achieved.
– You implement data quality controls and alert stakeholders so that the necessary actions can be taken.
All projects within the VEB are developed according to the Agile SCRUM methodology. As a data engineer you work closely with the other members of the multidisciplinary team.
You report to the project manager of the team.
Specific skills (technical competences)
Experience with AWS Senior
Experience with Spark Senior
Experience in Python Senior
Experience with Data pipelines Senior
Experience with SQL Server Medior
Experience with GIT Senior
Experience with Redshift Senior
Experience with DMS Medior
Experience with EMR Senior