i agree that scalable infrastructure is needed to manage a production pipeline, as others have explained well.
i found this article was a useful reminder, because sometimes a job doesnt require a fully grown infrastructure. i commonly get these requests that dont overlap with existing infrastructure and wont need any followup. in that particular case a hadoop cluster, heck even loading into a pg db would be wasted effort.
but i wouldnt want to manage our clickstream analytics pipeline with shell scripts and cron jobs.
is there any lightweight tooling out there that can schedule/run basic pipeline jobs in a shell environment?
i found this article was a useful reminder, because sometimes a job doesnt require a fully grown infrastructure. i commonly get these requests that dont overlap with existing infrastructure and wont need any followup. in that particular case a hadoop cluster, heck even loading into a pg db would be wasted effort.
but i wouldnt want to manage our clickstream analytics pipeline with shell scripts and cron jobs.
is there any lightweight tooling out there that can schedule/run basic pipeline jobs in a shell environment?