Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Databunker – a GDPR compliant, secure storage for personal data (PII) (github.com/securitybunker)
106 points by stremovsky on April 4, 2021 | hide | past | favorite | 34 comments


It seems like a nice idea, the different consent/withdraw/forget workflow. But it seems the core idea is that all user PII is stored in a central table/tables and all other places uses reference. You can do this normally in a regular application, even the old version had only userId (a non PII info without the user table) and the PII was centralized. I think the biggest challenge is getting there where your PII is not spread all over the database and some denormalized PII for performance or whatever reason.

I guess the biggest advantage of this project is removing access to the PII by means of joins and such and automatically enforcing access to PII using a restricted API. I guess a premade API makes it much easier to ensure nobody ends up violating that access and integrate PII too closely with the application.


Thanks for the feedback!


Hi guys,

Project demo is back to life. Project demo is available at:

https://demo.databunker.org/

User account: Phone: 4444 Code: 4444

Admin access token: DEMO

Many more info is available at:

https://databunker.org/


I think there would be more value to this project as a standard, and a set of implementations of the standard in different libraries and frameworks.

The companies I work for are just going to re-implement this (poorly) in their own language and framework. They generally can't just pick up a single turn-key solution, because they already have 50 custom internal systems with records they need to manage.

If there were open source libraries that followed a standard for GDPR record management, they could pick up those libraries and plug the pieces they need together, according to the standard. That would remove a lot bugs from trying to write all the code themselves, and make it easier to integrate different systems.


Nice project, although I have question I would appreciate someone can answer. How does in real world "right to forget" works. What is confusing part for me that data that identify you are also required for the business, so how do you draw line what can be forgotten and what cannot. Let say I use some service, then I violate policies of that company, then I exercise my "right to forget", and after they delete my data I sign up again and repeat the entire thing? Second, how does that work in regards to book keeping and tax policies, where you are required to have data about your clients?


The right to erasure (aka the right to be forgotten) is not universal and only applies in certain circumstances.

> Let say I use some service, then I violate policies of that company, then I exercise my "right to forget", and after they delete my data I sign up again and repeat the entire thing?

In this case a business (or 'data controller' in GDPR lingo) can use 'legitimate interest' as a lawful basis for processing the users information. Of course the data you kept would have to be proportional to what you're doing. For example, it would be hard to argue that you needed to keep the users billing address history if your services used a simple email black list (this is the 'data minimisation' principle).

> how does that work in regards to book keeping and tax policies, where you are required to have data about your clients?

As a rule of thumb, if you're using some personal data to comply with another piece of law then that usage is generally exempt from GDPR.

Source: https://ico.org.uk/for-organisations/guide-to-data-protectio...


That does get complicated in the real world. You might need to retain some data for potential future refunds, for example. But perhaps the application that does refunds also does the loyalty program, and the internals of the app aren't always separate enough that you can delete/obfuscate/whatever info from just the loyalty part.


> You might need to retain some data for potential future refunds, for example.

Then that would be a legitimate interest, and you could store that information for a period of time that is reasonable for processing refund requests.

But you would be barred from using that same information for a different purpose, e.g. the loyalty program.

GDPR article 25 requires systems to be have privacy built in, so a system such as the one you describe where a separation of these concerns is impossible, would probably itself be in violation of the regulation.


Thanks.


I am no expert on GDPR or security, but wouldn't a simple "PII to Cryptologically Secure Hash" solution work for some of this? The PII would possibly need to be accessed piecemeal while the account is active, so hashing is not appropriate alone, but once the account is deleted you could store a user's hash (or partial hash, made from only truly unique info or info combos) since it cannot be reconstituted and contains no specific PII. You then store this hash in your "abusive person" list, or whatever, maybe link it to refund data if needed, and if a "forgotten" user needs to interact with the service they fill in their information which is converted to the hash without saving. Doable?


There are two issue with hashing I can envisage:

1. Nothing user has is truly hash-able, (email can be replaced, there are people with the same name/dob/place of birth, address is not permanent attribute...)

2. Hash key can have duplicates - so those collisions would block different users (probably not for small companies but for FB with 2 billion users something worth considering.)


this doesn't cut it. someone could take a list of email addresses, hash them, and then reidentify the dataset. hashing buys you nothing from a gdpr/ccpa compliance perspective, storing the hash is seen as no different from storing the pii itself. it really only makes things harder because it becomes more difficult to find where all the pii is when someone submits a request for you to return or delete their data.


Do you have a source for this? I thought for the gdpr it was enough that data is not easily accessible. For example, it is not necessary to delete PII from backups unless they can be automatically restored (and are reasonably encrypted). Hashing PII thus falls under this category.


i don't have a source i can link here, this is guidance i have gotten from lawyers.


I think you should differentiate between (or use one of) "personal data" and "personally identifiable information" – which are different concepts.

A user-token – if consistent and mappable – would, for instance, be "personal data" at least for the service provider for a storage solution such as this.

Also, like other "self-sovereign identity" and data hubs, services such as this should be very clear that the only privacy-guarantee it can practically make to its users is regarding authorization of first-party access to data. Outside of that, no technical guarantees can be made (whether we're talking caching for legitimate reasons, or sharing/selling data to partners).


Really good idea. Excellent to capture the abstraction of a usertoken. Implementation looks like a good start. Good luck to you!


Thanks!


Nice project. But mind that Some risks are designed by default. 1. Centralisation 2. Search 3. No automatic retention 4. Hard issues for managing online and offline backup/restore are not tackled. 4. Option to use cloud storage/ service 5. Too much details needed for a customerrecord.

Text on the site is often incorrect. ‘pseudonymization as a valid solution to store customer data as defined by GDPR.’ This is simple not true. Gdpr measurements are based on defined risks that differ per use case.


Thanks for the feedback. I would love to clarify all your questions. [email protected]


Great Project! Looks very promising!


Thanks!


Demo goes to 502 bad gateway ;\


Hi, The demo is fixed now.


Looks interesting, good luck!


Thanks!


> and you still need to consult with an attorney specializing in privacy.

Governments should be refunding solicitor costs to anyone needing GDPR advice. Otherwise this is just another way to add barriers. If you are on modest income you can forget about setting up a website in the EU.


The law is quite readable, and the various Data Protection Agencies (country-specific regulators) have provided more concrete guidance. If you're setting up a website that takes a restrained approach to personal data, you don't necessarily need an attorney.


Databunker turns basically any startup to be privacy by design compliant.


how does it address ip addresses in server logs?



> Otherwise this is just another way to add barriers.

Personally I think pretty much everything in GDPR is just sensible guidelines for how to handle personal data, and if you're not willing to do those things then you probably shouldn't be handling personal data in the first place. Being ignorant of good data practice is not an excuse.

> If you are on modest income you can forget about setting up a website in the EU.

This is just rubbish. GDPR only applies to personal info for a start so if you don't store personal info then you have nothing to worry about. Even if you do store personal info the vast majority of use cases are really straightforward and require a very minimal understanding of the law to be compliant.


Furthermore you can often ask them for help (or so I have heard).


> GDPR is just sensible guidelines for how to handle personal data

And yet it doesn't say "don't give it to me if you don't want me to have it."

> GDPR only applies to personal info for a start so if you don't store personal info then you have nothing to worry about.

So logging IPs is fine?


IP addresses are (probably) considered personal data under the GDPR.

https://www.fieldfisher.com/en/services/privacy-security-and...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: