TLDR: SQL feels most natural for joining and aggregating data sets; Python is favorable for filtering and transforming data due to its higher flexibility and extendability
DataCater (https://datacater.io) | Remote (Germany or CET +- 2h) | Full Time | Cloud infrastructure engineers, Frontend engineers, Software engineers
At DataCater, we help businesses to efficiently connect their growing amount of data silos. To this end, we build the self-service platform for streaming data pipelines, based on the open-source technology Apache Kafka. DataCater enables companies to benefit from the power of event streaming, without handling the complexity. Our mission is to provide essential tooling, so that data teams can unlock the full value of their data, faster. DataCater was founded in 2020 and is serving customers from industries, such as finance, e-commerce, media, and software.
Sounds interesting? Please send an e-mail to our CTO Hakan ([email protected]), tell a few words about yourself, describe what excites you the most about the role, and send along your CV.
I suggest to try thinking in terms of events instead of streams.
You can extract data change events (e.g., INSERTs) from a data source, transform them with a streaming application (e.g., built with Kafka Streams), and load the transformed events into a data sink.
DataCater (https://datacater.io) | Remote (Germany or CET +- 2h) | Full Time | Cloud infrastructure engineers, Frontend engineers, Software engineers
At DataCater, we help businesses to efficiently connect their growing amount of data silos. To this end, we build the self-service platform for streaming data pipelines, based on the open-source technology Apache Kafka. DataCater enables companies to benefit from the power of event streaming, without handling the complexity. Our mission is to provide essential tooling, so that data teams can unlock the full value of their data, faster. DataCater was founded in 2020 and is serving customers from industries, such as finance, e-commerce, media, and software.
Sounds interesting? Please send an e-mail to our CTO Hakan ([email protected]), tell a few words about yourself, describe what excites you the most about the role, and send along your CV.
DataCater (https://datacater.io) | Remote (Germany or CET +- 2h) | Full Time | Cloud infrastructure engineers, Frontend engineers, Software engineers
At DataCater, we help businesses to efficiently connect their growing amount of data silos. To this end, we build the self-service platform for streaming data pipelines, based on the open-source technology Apache Kafka. DataCater enables companies to benefit from the power of event streaming, without handling the complexity. Our mission is to provide essential tooling, so that data teams can unlock the full value of their data, faster. DataCater was founded in 2020 and is serving customers from industries, such as finance, e-commerce, media, and software.
Sounds interesting? Please send an e-mail to our CTO Hakan ([email protected]), tell a few words about yourself, describe what excites you the most about the role, and send along your CV.
DataCater (https://datacater.io) | Remote (Germany or CET +- 2h) | Full Time | Cloud infrastructure engineers, Frontend engineers, Software engineers
At DataCater, we help businesses to efficiently connect their growing amount of data silos. To this end, we build the self-service platform for streaming data pipelines, based on the open-source technology Apache Kafka. DataCater enables companies to benefit from the power of event streaming, without handling the complexity. Our mission is to provide essential tooling, so that data teams can unlock the full value of their data, faster. DataCater was founded in 2020 and is serving customers from industries, such as finance, e-commerce, media, and software.
Sounds interesting? Please send an e-mail to our CTO Hakan ([email protected]), tell a few words about yourself, describe what excites you the most about the role, and send along your CV.
Thank you! :) I updated the article with a note on privileged containers.
Evaluating user code inside privileged containers is indeed a security nightmare. Fortunately, --privileged is not enabled by default, which is why I think that containers are quite secure by default.
Make sure you are at least using user namespaces which drop the mknod cap by their nature, or better yet in rootless mode.
I have filed several bugs that I know can result in breakout but as I can't make myself disclose vulnerabilities I have no stick to get them to change their 'wont fix' decisions.
k8s doesn't support user namespaces let alone user mount namespaces.
The point being that for k8s and docker, any role that allows you to create pods/containers or to compromise such a role with any hop number should be considered as having root permissions.
While I won't share any non-privileged breakouts, here is an example of how easy it is with the --privileged flag.
While I am not recommending it in general, AppArmor is fairly easy to develop CI friendly restrictions with and I would strongly suggest you protect the directory space and devices that you don't use with it.
Not perfect but it typically can help prevent leaks caused by adding features or configuration errors.
Runc using seccomp to make a container process make the one-way transition into a "secure" state and through dropped capabilities is what provides additional security.
Hiding pids doesn't matter when any container can list /dev or look through /sys and /proc to find device major and minor numbers or to modify kernel parameters or files that are mistakenly given write access.
The overwriting of the runc executable CVE that recently happened will give an actual case there.
Namespaces are more about decoupling and avoiding pollution than security.
Just like chroot, the shared kernel instance has a large attack surface, especially if you don't leverage all of the tools provided.
As you are effectively running arbitrary code from users, I would highly suggest you look into non container runtime protection.
It can be made reasonably safe but an overconfidence in containers being inherently secure will make you a target.
If you are on k8s you should be using anti-affinity or taints to make sure containers running external user code is not running on the same nodes as other containers or better than that have a dedicated k8s for that need.
Especially if you have persistent storage as user mount point namespaces are new in the kernel and default mounts typically are implemented by granting CAP_SYS_ADMIN capabilities(7)
TLDR: SQL feels most natural for joining and aggregating data sets; Python is favorable for filtering and transforming data due to its higher flexibility and extendability