Hacker Newsnew | past | comments | ask | show | jobs | submit | zilchers's commentslogin

Nothing beats the Dreamforce swag at the SF good will locations.


These are the real questions


Apparently a pump and dump


Ya, reading this article makes me think this person has never actually used a taxi (at least not one outside of Manhattan, which has probably the best cab system in the world). This guy certainly hasn’t tried catching a cab at midnight in San Francisco where they’ll reject you if you’re not going the right direction, insist their credit card machine is broken, and otherwise be insanely difficult to use. Every comment in this thread has started “I don’t like Uber, but…” and it’s absolutely true, Uber has problems but the service and the app have been a game changer. And his taking exception with the drunk driving study is a bit infuriating, he insinuated the data is compromised, but doesn’t offer any evidence. I can only speak anecdotally, but within my social circle Uber has absolutely decreased the likelihood of someone driving drunk, and if it even contributed to 1% less driving fatalities, that’s and enormous achievement.


The terminology around seed vs a vs b is so skewed right now, but a 50 person series a startup, just estimating, is a 5 mil per year burn. Hard to say exactly the raise, but personally I’d be uncomfortable with less than 3 years runway, so let’s call it a 12mil raise. 12 mil raise for 25% of the company (again, totally back of the napkin here) puts us at basically 50mil valuation. 50 mil valuation on 1mil ARR seems way high to me, basically this all seems sort of shifted left on the headcount side (series b should be 50 people) and right on the revenue side.


I feel like 3 years of runway is actually quite a lot if you're gunning for extremely fast growth. I think you would have raised a Series B well within 3 years if you're growing as expected.


I wanted to sign up for this as an individual, but you need to be at least a 10 person company, so I couldn’t. No idea why they’d have that restriction


That's the hurdle I hit with it last time I looked.


I'd say the counter example to this is Powershell - if you've ever used it, it's quite verbose and can be less than pleasant to write. I think having both options is the best, when you're getting started it's nice to have verbose, highly self explanatory commands. As you get more familiar with the environment, it's nice to move to something more concise.

Import-Csv -Path $src | foreach { $file = $_.column8 Write-Verbose "Writing to $file" Export-Csv -Path (Join-Path -Path $dstDir -ChildPath "$($file).csv") -InputObject $_ -Append -Encoding ASCII -NoTypeInformation


At the interactive shell, the verbosity collapses:

    ipcsv $src|group column8|%{$_.group|epcsv "$dstdir\$($_.name).csv" -noty -e ascii -v}
The bad part is when writing "good" powershell the verbosity is exponentially worse and it turns into

    try
    {
        if (-not [string]::IsNullOrEmpty($_.Column8))
        {

            $fullPathName = Join-Path -Path $dstDir -ChildPath $_.Column8
            $pathTestResult = test-path -LiteralPath $fullPathName -ErrorAction Stop


            # This is a hashtable of the parameters to a cmdlet
            # the only purpose of this 'splatting' is that
            # powershell commands get too long
            # and can't be line-wrapped in any good way

            $exportParams = @{
                Encoding = 'ASCII'
                NoTypeInformation = $true
                Append = $true
                LiteralPath = $fullPathName
                Verbose = $true
            }

            Export-Csv $exportParams
        }
    }
    catch [whatever]
    {
        
    }
and on and on and on, ugh.


How do you think you compare to Cockroach? Besides the query semantics, I’m curious about the depth of the thought around cross region / cross continent and horizontal scaling concerns?


EdgeDB is based on Postgres. There're ways of scaling it as Citus Data has shown, and there's a lot of ongoing work in PostgreSQL itself to improve the scalability. We'll be using that as well as actively contributing to further improve it.


Does anyone know much about the tech on this? I assume when they say persistent they mean across VM restarts, but are they actually doing some sort of disk persistence too?


This is byte-addressable persistent memory. They look like DRAM DIMMs and they plug into DIMM slots. You access them using your memory controller and not your storage controller. People sometimes refer to them as non-volatile memory (NVM). Intel used to call it Apache Pass.

They're a nightmare to program because OSes do not have a good abstraction for them (at least not yet). Accessing them through the file-system seems sub-optimal (this is byte-addressable memory and not a block device). Accessing them through virtual memory is also pretty bad because they're much slower than DRAM.


Disclaimer: I work at Intel on PMDK (pmem.io)

Both Windows and Linux implement DAX, which, as @the8472 explained, allows bypassing page cache in memory mapped I/O. Additionally, DAX optionally allows you to flush your data directly from user-space instead of calling msync.

And that's the gist of NVM programming model [0], its entire point is to allow applications to avoid the now hugely excessive abstraction layer of traditional storage.

And I will freely admit that programming to raw memory mapped files can be difficult, but there is ongoing work on making it easier. An example of that is, excuse the shameless plug, Persistent Memory Development Kit [1], which makes writing new software for this new type of memory much simpler.

Performance of an NVDIMM is obviously hardware dependent, but the now widely accepted programming model works with the assumption that persistent memory is fast enough so that it is reasonable to stall a CPU while an instruction is accessing it. I'm not sure on what hardware evaluations you are basing your claims on, but let me assure you that the HW solution being described in the blog post does not violate that assumption.

[0] - https://www.snia.org/tech_activities/standards/curr_standard...

[1] - http://pmem.io/


Do you know how well tools like Cap'n Proto and Protocol Buffers help for dealing with this kind of scenario? I'd imagine that some kind of low latency/cost serialization system would help significantly in using the device. Cap'n Proto I'd imagine would work nicely for reading data off since it should be able to read and use the structure with no extra copying or decoding, but I have no idea how the situation with writing would win out.


That's an excellent question. The answer is that those type of libraries will work just fine for read-only workloads since you cannot mutate a data structure once you have serialized and written it out to a file. The best part is, this will work without any (or very little) modifications, as long as your application is suited for using mmap. All you have to do is to use a persistent memory resident file on a DAX file system.

If you need dynamic mutable state however, as great as these libraries are, you will need a more complex solution with memory allocation and transactions.


The simplest way of using this is to not do any serialization at all, just store any information you want persisted in memory allocated from the region you mmaped to the Optane DIMMs instead of the DRAM DIMMs.


The main reason I've been thinking about tools like that is more because the persistent structure should probably work regardless of compiler settings/flags and code updates. Directly mmaping the structures you're going to have to worry about how things are packed, and if a new compiler optimization causes things to go differently (something gets eliminated in one version and not the other).


I would say that this is still a mentality of thinking of something as "data on disk", where the data should be in an "ABI-stable" format.

Think of persistent-memory data as more like data resident in the memory of a runtime which can experience a "hot code upgrade", like the Erlang runtime.

In Erlang, when you hot-upgrade your running code, you usually do so through a managed system of "relups" (RELease UPdates), which are sort of a cross between an RDBMS migration, and a traditional installer-package full of newer versions of code and assets.

The Erlang runtime takes this package, unpacks it, and then runs a master relup script, which can been authored to do arbitrary things (including, if ultimately necessary, fully rebooting the node, throwing away all that in-memory state.) Mostly, though, a relup script calls into individual "appup" scripts for each Erlang application. Those applications then specify how their corresponding running processes are to be updated—which can sometimes be fraught (if e.g. the new release requires that you add new service-processes or remove old ones, migrating in-memory state into a new architecture), but usually just means calling a "code_change" callback on all the service-processes.

This "code_change" callback is the thing that's most like an RDBMS migration: it is called from the event-loop running in the old version of the code of the service-process, and passes in the old in-memory state; and when it returns, it's returning the new in-memory state, to resume the event loop in the new version of the code of the service-process.

This is basically how I'd picture dealing with code updates (including ones due to build-setting changes) in software that deals with pmem: you'd architect your code such that the library that touches the pmem can have multiple versions of it dynamically loaded (though not running) at the same time; and then you'd stage a migration from the old code's pmem state encoding, to the new version's, by

1. dlopen(2)ing the new version of the lib;

2. telling the old version of the lib to stop any ongoing work;

3. handing off the toplevel pmem state-handle that the old version of the lib was using, to a "migrate" function in the new version of the lib;

4. replacing the old version's pmem state-handle with a dummy one;

5. telling the old version of the lib to terminate (and so do the trivial cleanup to the world it sees through the dummy handle);

6. tell the new version of the lib to initialize, using the handle to the now-migrated-in-format pmem;

7. dlclose(2) the old version of the lib.

Basically, picture what something like Photoshop would have to do to enable you to upgrade its plugins without restarting it or closing your working document, and you'll have the right architecture.


So does that update process rewrite all 6 TB of data in the new format? Because I can imagine why people would rather not do that.


I mean, it depends on whether your pmem data is a bunch of copy-on-write persistent data structures like HAMTs; or maybe packed data structures like Vector<Foo>s where you can't easily rewrite one Foo to be a different size without rewriting the whole vector; etc.

If 1. the new format is just like the old format except for one little difference to one struct, and 2. structs point to other structs, rather than containing them; then it's just a matter of calling your within-mmap(2)ed-arena malloc(2)-equivalent function to get a new chunk of the pmem arena of the right size for the new version of the struct; and then rewriting the pointer in the other struct to point to it; and then calling your free(2)-equivalent on the old version of the struct.

If you change the structure of some fundamental primitive type like how strings are represented, then you're probably going to have to rewrite your whole pmem arena.

Though, also, you can just make your code deal with both old and new versions of the struct, and only migrate structs when they're getting modified anyway. (This is equivalent to the way you'd avoid an RDBMS migration rewrite an entire table, by instead adding a trigger that makes the migration happen to a row on UPDATE, and then ensuring that your business-layer can deal with both migrated and un-migrated versions of the row.)


> If you change the structure of some fundamental primitive type like how strings are represented, then you're probably going to have to rewrite your whole pmem arena.

That's part of the reason why I was thinking something like Cap'n Proto or Protocol Buffers might make sense for a lot of structures. You pay a bit of cost for writing but get to gracefully handle upgrades to the structure if you do it right. I'd imagine you want to use something higher level just above them to organize the records. But this is all a really new area of thinking about this so I'm probably being a bit obtuse about it.


Probably something like Flatbuffers, with which you can skip the marshalling and unmarshalling. It was designed for games, where you would map files into memory.


The ultimate zero-cost serialization system is just mmapping the Optane and casting pointers, but if you want something less fragile you could probably use one of those libraries on top of mmap.


> They're a nightmare to program because OSes do not have a good abstraction for them (at least not yet).

With DAX[0] linux already has the ability to put a filesystem (currently ext4 and xfs) on NVDIMMS and then let userspace address them through mmap while skipping the page cache indirection. I.e. you're directly byte-addressing them through the memory controller via standard memory-mapped file abstractions. Direct block device mapping of nvdimms without filesystem is also possible.

[0] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


The Persistent Memory Development Kit (PMDK) offers high-level abstractions over DAX. Looks like most mortals would use libpmemobj (or its C++ bindings).

http://pmem.io/pmdk/


It seems like Single-level store[1] would be a really good fit for this. I was going to make a crack about bringing back Multics but apparently IBM has an OS using this.

[1]https://en.wikipedia.org/wiki/Single-level_store


Is there are a reason to not just use DRAM along with a battery to achieve the same persistence but as fast as DRAM?


Probably because systems that passively do their job tend to be preferred.

Also a battery would only last so long. IIRC DRAM needs to be constantly refreshed, so, it would be a trade-off between capacity and duration.

Optane seems[1] to be 20~30X slower than DRAM but 4~10X faster than server SSDs

[1]: https://superuser.com/a/1195674/187732


There are NVDIMMs that have DRAM and a matching quantity of NAND flash memory to save the contents to in the event of a power failure. They require an external capacitor module and are limited in data capacity by how much DRAM you can fit on the module. You can fit far more 3D XPoint memory on a module than DRAM, and it doesn't require the external capacitors to achieve persistence, and it should be significantly cheaper on a per-GB basis.


Just to reiterate the point... this is an instance with terabytes of near-memory-speed storage.

If persistent memory pans out as a technology it will completely upend the way we think about building software and the cost tradeoffs of hardware. (as much as or more so than the transition from spinning disks to ssds)


How do you define "pans out"? What performance and price differences between it, flash, and DRAM do you have in mind?

Because I'll keep reminding people that putting a DRAM cache in front of some flash can very closely approximate a large persistent memory. If people wanted to build software for that kind of system, they could do it today. The hardware is not the blocker.


Diablo Memory1 used flash DIMMs with DRAM DIMMs as cache. When we tried it, it worked OK for some workloads and poorly for others, but it was also buggy and then the company went out of business. So a large part of "pans out" is simply a production-quality implementation that you can buy.

Note that Optane DIMMs have been delayed by around two years at this point and we still don't know what they will cost.


Cache will always stay just cache, unless it is the same size that an underlying persistent storage. You cannot read or write larger-than-cache chunks without performance degradation. Also you have a start-up cache-warming problem.


Most workloads don't need the entire storage to be at maximum speed all the time. In other words, in most situations cache is plenty for speed purposes. And the mapping layer can hide the chunk sizes. But we still don't see people writing software based around persistence. Maybe a lot of people are simply stuck in their ways, or maybe the benefits aren't actually that big.

As for cache-warming, that's also a configuration issue. When you reboot, leave the 'cache' portion of DRAM alone. Then as soon as the service resumes, the cache is already hot. When you shut down a node for an extended period, consider spending five minutes writing the cache to disc. And the article is about cloud servers anyway, where a shutdown typically implies losing all local storage whether it's persistent or not.


The reason is the battery. These devices can be powered off and save state, like an SSD.


But in the context of the OP, presumably devices in a data center would never be powered down on purpose to save energy? In which case it seems that battery-backup DRAM would work just as well for this use case, and be both cheaper and faster.


Optane is in between DRAM and flash in terms of cost (and performance) - it's also denser, so you can fit much more storage on each DIMM slot.


DRAM can not be simply naively battery-backed; it needs active refreshing. And as memory controllers reside in CPUs these days, that would mean keeping the CPU powered up.


It uses Intel Optane. It's basically super fast SSD used as RAM.


I love GCP’s postmortems - they’re open, honest, insightful, and I wish I could get my company to OK us releasing details like this when we have outages. It’s part of the reason I personally like GCP more than AWS (and certainly azure, those guys don’t admit to shit regardless how bad the outage is).

Edit: Wow, downvotes because I like transparency from my cloud hoster, super interesting...


You're getting downvotes because this was not a transparent and open report. It was vague and was more advertise-y than postmortem-y.

Not saying Amazon is perfect by any means either, but there's a lot of room for improvement. Good postmortems give everyone ideas on how to solidify their own processes and prevent other issues. This was just fluff.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: