More

nicornk · 2025-08-14T08:20:00 1755159600

I agree. Pixi solves all of that issues and is fully open source including the packages from conda-forge.

Too bad there is nowadays the confusion with anaconda (the distribution that requires a license) and the FOSS pieces of conda-forge. Explain that to your legacy IT or Procurement -.-

nicornk · 2025-05-23T06:03:23 1747980203

Fivetran tried to upstream write support but it was not accepted https://github.com/duckdb/duckdb-iceberg/pull/95

shakna · 2025-05-23T10:42:30 1747996950

That sounds less "not accepted" and more "will implement, rewrite required". It was only a couple months ago.

nicornk · 2025-02-12T11:21:59 1739359319

If you just want to enable ssh to ec2 instances (through SSM) using ssh i-… you can add the following lines to your ssh config

https://gist.github.com/nicornk/5d2c0cd02179f9b46cc7df459af0...

host i-* IdentityFile ~/.ssh/id_rsa TCPKeepAlive yes ServerAliveInterval 120 User ec2-user ProxyCommand sh -c "aws ec2 start-instances --instance-ids %h ; aws ec2 wait instance-running --instance-ids %h ; aws ec2-instance-connect send-ssh-public-key --instance-id %h --instance-os-user %r --ssh-public-key 'file://~/.ssh/id_rsa.pub' --availability-zone $(aws ec2 describe-instances --instance-ids %h --query 'Reservations[0].Instances[0].Placement.AvailabilityZone') ; aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"

This will also allow VSCode remote development.

Galanwe · 2025-02-12T15:08:40 1739372920

My variation is to use a custom script as `ProxyCommand` that resolves private route53 DNS names to instance ids, because remembering instance IDs is insane.

smackeyacky · 2025-02-12T15:37:38 1739374658

Mine is to run a Tailscale node on a tiny ec2 instance. Not only enabling ssh but direct access to database instances, s3 buckets that are blocked from public access etc

scarface_74 · 2025-02-12T16:30:53 1739377853

How are S3 buckets blocked from public access? I mean I know there is literally a “Block public access” feature that keeps S3 buckets from being read or written by unauthenticated users. But as far as I know without some really weird bucket ACLs you can still access S3 buckets if you have the IAM credentials.

Before anyone well actually’s me. Yes I know you can also route S3 via AWS internal network with VPC Endpoints between AWS services.

nijave · 2025-02-12T21:20:34 1739395234

In general, condition keys

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_p...

And https://docs.aws.amazon.com/service-authorization/latest/ref...

Specifically the vpce one as the other poster mentioned but there's other like IP limits

Another way is an IdP that supports network or device rules. For instance, Cloudflare Access and Okta you can add policies where they'll only let you auth if you meet device or network requirements which achieved the same thing

Galanwe · 2025-02-13T07:04:39 1739430279

> Specifically the vpce one as the other poster mentioned but there's other like IP limits

IPs don't cut it to prevent public access. I can create my own personal AWS account, with the private IP I want, and use the credentials from there. There's really just VPC endpoints AFAIK.

Galanwe · 2025-02-12T19:07:15 1739387235

You essentially add a policy that limits the access to only come from your VPC endpoint.

icedchai · 2025-02-12T23:56:44 1739404604

I run an EC2 instance with SSM enabled. I then use the AWS CLI to port forward into the 'private' database instance or whatever from my desktop. The nice thing about this is it's all native AWS stuff, no need for 3rd party packages, etc.

nicornk · on Dec 15, 2024

The blog post echos my experience that duckDB just works (due to superior disk spilling capabilities) and polars OOMs a lot.

jdgoesmarching · on Dec 15, 2024

My biggest issue with Polars (don’t shoot me) was the poor LLM support. 4o at least seems to frequently get confused with Pandas syntax or logic.

This pushed me to finally investigate the DuckDB hype, and it’s hard to believe I can just write SQL and it just works.

serjester · on Dec 15, 2024

The API has mostly stabilized and at this point other than some minor errors (“groupby” vs “group_by”) LLM’s seem to do pretty well with it. Personally, I’m glad they made the breaking changes they did for the long-term health of the library.

diamondfist25 · on Dec 16, 2024

Problem is ur using 4o. Use 3.5 sonnet, its much better

jamesblonde · on Dec 16, 2024

groupby and group_by is a big problem for polars with 4o.

nicornk · on June 1, 2024

Snowflake usually unloads data to an internal stage bucket in the same region as your snowflake account. If you use an s3 gateway endpoint getting that data is free of egress charges.

nicornk · on June 1, 2024

and this is unfortunately the case for Velox.

_zoltan_ · on June 1, 2024

care to elaborate?

nicornk · on Feb 27, 2024

Yeah, finally a method to figure out the org id of all those snowflake internal stage buckets that snowflake does not want to share for our VPC Endpoint Policies…

nicornk · on Feb 6, 2024

So how did you „further optimized [buildkit] to build Docker images up to 40x faster“

kylegalbraith · on Feb 7, 2024

Hey, thanks for the question! Depot co-founder here.

We've optimized BuildKit for remote container builds with Depot. So we've added things like a different `--load` for pulling the image back that is optimized to only pull the layers back that have actually changed between the build and what is on the client. We've also done things like automatically supporting eStargz, adding the ability to `--push` and `--load` at the same time, and the ability to push to multiple registries in parallel.

We've removed saving/loading layer cache over the network. Instead, the BuildKit builder is ephemeral, and we orchestrate the cache across builds by persisting the layer cache to Ceph and reattaching it on the next build.

The largest speedup with Depot is that we build on native CPUs so that you can avoid emulation. We run native Intel and Arm builders with 16 CPUs and 32GB of memory inside of AWS. We also have the ability to run these builders in your own cloud account with a self-hosted data plane.

So the bulk of the speed comes from persisting layer cache across builds with Ceph and native CPUs. The optimized portions of BuildKit really help post-build currently. That said, we are working on some things in the middle of the build related to the DAG structure of BuildKit that will also optimize up in front of the build.

mdaniel · on Feb 7, 2024

> with Ceph

Seeing that reminded me of some healthy discussion in https://news.ycombinator.com/item?id=39235593 (SeaweedFS fast distributed storage system for blobs, objects, files and datalake) that may interest you. control-f for "Ceph" to see why it caught my eye

maxmcd · on Feb 6, 2024

hopefully depot will reply, but from my perspective it is mostly laid out on their homepage. they are comparing against builds in other CI products that use network-backed disks, virtualized hardware, and don’t keep a layer cache around. Depot provides fast hardware and disks and is good at making the layer cache available for subsequent builds.

You could likely get very similar performance by provisioning a single host with good hardware and simply leverage the on-host cache.

nicornk · on Feb 5, 2024

Pixi is great, highly recommended!

baggiponte · on Feb 6, 2024

I like pixi, but I am not likely to make the switch. They don't support pyproject.toml and other standards. This disqualifies it from being a potential "recommended tool" by PyPA or whatever.

banditelol · on Feb 6, 2024

Have you compared it with poetry or pip-tools? I'm thinking of trying pixi but still can't muster up energy to do it. Especially since for my use case poetry and pip-tools cover most of it.

nicornk · on Jan 17, 2024

There are Aldis in Germany with self checkout.