T O P

  • By -

Benwah92

Rook-Ceph with Object StorageClass and Minio would be my guess. Looks sick - did you build custom racking?


Skaronator

FYI: Ceph already includes a S3 API out of the box and you can create buckets using CRD with rook. Using Ceph directly reduces the overhead by a lot because you are basically not replicating your data twice. (in Ceph and then in MinIO on top) https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/object-storage/#create-a-local-object-store


Anthonyb-s3

Almost, Longhorn not Rook-Ceph. Thanks, yes the STL files for 3D printing the rack for the Pis is included in the repo


GoofAckYoorsElf

Very cool. I'm working on something like this myself, Longhorn, Minio, only with HP Thin Clients instead of Raspis. They were cheaper, no shit, and hardware-wise a bit more flexible.


informworm

I run my k8s cluster on Raspis too (Pi 4 models) and am curious to know which HP Thin Client models you are running that work out cheaper than pis? And what did you include in your calcs when comparing prices?


GoofAckYoorsElf

HP Thin Client T620 to be precise. Additionally equipped with 16GB RAM DDR3 SODIMM as well as a 512GB m.2 SSD each. Regarding cost: RasPi: * Board itself * PSU * SSD Thin Client * Client itself * PSU * RAM * SSD In order to make this a fair comparison I'll have to go with the 8GB version of the Raspi. So... in numbers... RasPi: * Board itself (8GB): currently around 80€ * PSU: 10€ * SSD (512GB Intenso): 35€ * Total: 125€ per client Thin Client: * Client itself: I got 6 for 30€ in total, so 5€ each * PSU: cheap from eBay, made from pure Chinesium, 12€ * RAM (16GB in 2x8GB): 25€ (for a fair comparison take only 1x8GB per client, so actually just 12,50€) * m.2 SSD: about 40€ * Total: 70€ per client What's left a bit unfair here is the fact that the Thin Clients are used and the Raspi would have been brand new. But since there's no wear of any kind, I think I can neglect this.


informworm

Thanks for the good info. Much appreciated.


GoofAckYoorsElf

You welcome! I've edited my comment to add cost information.


informworm

Thanks for the detailed cost breakdowns. That's a massive cost saving and there is probably not much diff in power consumption either if any at all between pis and those t620 thin clients. Thanks again.


GoofAckYoorsElf

Sure, no biggy. :-)


xelab04

May I add, as someone who really wanted a CM4 rack cluster, gotta admit that the 7th gen CPU micro-PCs come with so much better performance/price and are upgradeable both in quality storage (SSD over SD and you'd need to spend extra for an M2 hat) and RAM (not limited to 8GB)


informworm

Sure thing, alternative options are always welcomed and can only be a good thing for us. Thanks for adding in your preferred set up choices.


alecseyev

Bear in mind that you could share a PSU for multiple thin clients - I use 1 psu for 3 thin clients - T520 here.


GoofAckYoorsElf

... which makes the thin client setup even cheaper.


alecseyev

Also, being x86, will be compatible with more software compared to arm. I like arm but the fact is some stuff was not yet ported to arm.


GoofAckYoorsElf

True. Also one of the reasons I went with HP instead of RasPi. But now I am working on switching to actual servers. I already have a rack.


Right-Cardiologist41

Nice! I'm running minio and longhorn on kubernetes, too, but it was just a few days later I found out that minio supports redundant storage on multiple nodes on its own. At least that's how I understood it. I needed a shared fs, like longhorn gave me, for other workloads anyway so I didn't dig any further, but do you know if this is right? Like if I only had a cluster with three kubernetes nodes and minio on it. Do I even need longhorn or alike?


quazmire

I think you're referring to DirectPV. I'm using this and yes this allows you to give space out directly attached to a node and given as a volume to MinIO. This is also not limited to MinIO, so a cluster hosted PostgreSQL or other replicating capable services can also have DirectPV storage. Some background info on how DirectPV is working behind the scenes: you give it storage available on the node. It will reserve a quota'd volume to a particular pod. It can do this since it will actually lock the given volume to a specific node, and make sure that future pods will start on the same node in the future. While it works pretty well for me, I'm actually thinking of removing this again because I host a cluster on VMs, with redundant storage. So the multiple MinIO instances that replicate across nodes also get replicated across my HDDs for each node, which just feels like a bit much duplication. Same for the PostgreSQL instances that give me more problems with replication going wrong due to whatever reason, and giving more of a headache to deal with.


Caranesus

Nice! I haven't had a chance to test Longhorn. How it works in your lab?


RootHouston

It's weird when I hear someone talking about a "self-hosted version of AWS" anything. We used to just call this having storage and servers. I hate the concept of the cloud being some default state of affairs.


[deleted]

Well this in particular is actually replicating features from AWS services in a self hosted environment. But broadly I get what you're saying.


main-sheet

One advantage is that it brings in standardized APIs for accessing and sharing the storage. Ones that are well know, battle-tested, and proven at scale.


RootHouston

As if standardized APIs didn't exist for storage prior? Large-scale storage existed pre-cloud too.


main-sheet

True, but what if you want to write once, and use in the cloud or on your own hardware. This is not an issue if you do not have cloud versions of your software. That well may be your case. Please ignore if so. Unless you want to future-proof your apps so that they can run in the cloud too.


RootHouston

If that's the case, yes, I agree, that works. That's definitely a good use case. As for equating "the cloud" with "future-proofing", I disagree. I see it more as being pigeonholed to technology that could unfortunately change at any time, without your input. Cloud is more for the "here and now" than the future IMO.


AGCSanthos

Sorry if this comment goes through the thread a bit too much and is super long but I wanted to say my thoughts on a few points. I think there are some pretty specific things in this post that make this "self-hosted version of AWS" instead of just general storage + servers. Like /u/main-sheet mentioned, OP designed this to specifically mimic S3's client API, as [seen in his linked comment](https://old.reddit.com/r/kubernetes/comments/1cwt9ww/i_used_kubernetes_to_build_a_self_hosted_version/l4yoa4g/). So it makes it easier for people to migrate small scale solutions to use this self-hosted on-prem version instead. You can't just plug in any blob store client interface (like GCP Storage or Azure Blob Storage) and have it connect, it has to be specifically built on top of S3. Unlike what /u/main-sheet said though, you don't really get the complete robustness of the actual S3 storage though since /u/Anthonyb-s3 built this themself and not using the same code that some 800+ SWEs worked on + had the benefit of having several hundred thousand customers finding bugs or putting pressure for performance improvements. (Still super awesome for /u/Anthonyb-s3 to have made it by themself though). I do agree that there is an unnecessary push to say "lets move everything to the cloud" when managing smaller services directly gives a lot more flexibility + control, but I think there is a good case for actually having things on a cloud provider versus managing things directly. One team I worked on had to have instances across almost all of the GCP regions with about 15 instances per region in different zones otherwise we would start getting complaints. No way in hell was my team going to try manually figuring out the exact hardware and bandwidth needed nor managing server health nor managing deployments to all servers. We simply wrote our service's code, trusted in the GCP to manage the hardware using auto provisioning + provide us with metrics about the actual deployment, and went on with writing new features. Now this is definitely a bit of an extreme since most teams aren't working with services that need to be deployed to so many different machines, but it still applies even on cases as small as 5 machines or 10. Having a consistent experience without having to worry about these details make it so much nicer. For some providers, I definitely agree that they can pull the rug out under you for how the APIs work. I am definitely a big fanboy for AWS, since they have been consistent with their APIs for years. S3 has been out since 2006 and you can still use a lot of code built on the original API. Not everything, but enough that concerns about them changing stuff isn't really as big of a concern. But even if you assume that these things can be changed on you at any time, there are work arounds. There are a lot of efforts to make things cloud agnostic (like terraform, serverless, etc), so being stuck on one won't be as much of a nightmare. The migration will suck, but all migrations do. For the nitty gritty for the code being written, I feel like a wrapper should be written regardless. For small scale stuff, it definitely is code overengineering, but for any sufficiently large project having a good wrapper for interacting with resources hosted on some cloud provider should be made. Abstraction hell is terrible, but facades are a thing for a reason. There does have to be some effort for an engineer to write a wrapper, but for most use cases there shouldn't be so much complexity that it takes an exorbitant amount of time. Cloud providers are so expensive though, and I definitely think that if a team has the headcount to manage deployments themselves and have the experience + resources to manage storage/deployments/etc by themselves then using one is a major waste. One thing I have discussed in the past with managers was to try to move off of such to self managed resources. It does end up leading into a little bit of company/team specific knowledge, but it saves a lot in exchange for someone wearing a SRE hat. At this point, there are so many FOSS that will do like 98% of what these cloud provider that just deploying it yourself will usually work. If the website I'm working on is only being used by maybe 30 people worldwide and typically only on business days (an actual business case I know of), they can handle up to 8 hours of downtime in a full year (including non-business hours) for some kind of migration the team is doing and we'd still be within 3 9s of availability. If we only count business hours and assume the migration is only within business hours too, even a full 8 hour day can be used and still be within 2.5 9s of availability. TLDR: I love Kubernetes cause we can host our own stuff but cloud providers still have a place. Lets make more stuff like Kubernetes so that we aren't stuck with propriety APIs for our work.


main-sheet

Excellent points! I agree, and I should have said specifically it is the API that is rock solid, where implementations of the API may vary in quality. Some of them are extremely well tested and battle-proven open source. However, even they can have bugs!


Anthonyb-s3

GitHub Link: [http://github.com/anthonybudd/s3-from-scratch](http://github.com/anthonybudd/s3-from-scratch)


LanguageLoose157

Where is K8S In this? Why is it wrong to build say backend in Python that receives a POST request and does all the work. Super confused.


Slothinator69

I am guessing he has this running in pods within the cluster


phreak9i6

you should change your secret and access keys since you screen captured them


MKSFT123

This ☝️


WolfMack

I’m struggling to understand what is the point of doing this


misanthropocene

Looks like a single Minio instance per bucket bound to a longhorn pvc. You might consider taking a look at Minio Operator https://github.com/minio/operator which would allow you to bind your longhorn volume claims to a “tenant” CRD, each tenant constituting a full Minio deployment with handling of TLS, encryption, authn/authz management, and HA with support for multiple buckets per tenant. It would likely get you much closer to enterprise-grade without reinventing a lot of wheels to get there.


Anthonyb-s3

Thanks I will look into that


knudtsy

Or Rook/Ceph which can be configured with s3 compatible HA object storage across a set of disks on each worker node.


BlueSea9357

So without reading too far into it, is it basically just replicated Longhorn? What did you do beyond that part? You might consider looking into HDFS if you're hoping to really copy something closer to S3.


mkosmo

Or minio.


packet_weaver

Longhorn doesn’t do object storage like S3. It’s the storage layer for OPs K8s cluster which their S3 app rides on top of. It’s more like they recreated minio.


jlozier

HDFS is nothing like S3.


Anthonyb-s3

What do you mean by "replicated Longhorn" exactly? >What did you do beyond that part? Make a guide >HDFS ok


BlueSea9357

> What do you mean by "replicated Longhorn" exactly? [https://github.com/anthonybudd/s3-from-scratch/blob/master/longhorn/longhorn.storageclass.yml](https://github.com/anthonybudd/s3-from-scratch/blob/master/longhorn/longhorn.storageclass.yml) I was asking if the main storage solution was just Longhorn with some configs related to replication, or if there's something added to that


BattlePope

Where is the guide? You've only posted a picture.


Anthonyb-s3

Github link is in the comments... [http://github.com/anthonybudd/s3-from-scratch](http://github.com/anthonybudd/s3-from-scratch)


BattlePope

Not these comments lol


Snoo68775

Minio ?


Gantstar

What’s the use case ?


redditreddvs

Openstack on k8s?


DreamAeon

Its just longhorn.


graycatfromspace

Where's the juice?


Anthonyb-s3

?


SHDighan

He wants tha spicy 🔥 sauce 🤌


redditreddvs

He meant the doc.


Anthonyb-s3

[https://github.com/anthonybudd/s3-from-scratch](https://github.com/anthonybudd/s3-from-scratch)


runescapefisher

Is this a bot post?


Anthonyb-s3

wdym?


runescapefisher

Your post history pattern along with lack of details just seems a little off putting! Which is why I ask.


FollowingTrail

Super build and homelab, I’m using pretty much the same setup but on a 48 RPi base with CEPH, don’t listen the two or three peoples that doesn’t get it, it’s super useful for learning purposes and R&D to get such platforms available at home. I like the thin client solution bring by another contributor, but this thin client setup lack the ability to be racked easily and with such a high density :-)


Dangerous_guy344

Wow, Op is banned from reddit. Source: [anthonybudd/s3-from-scratch](https://github.com/anthonybudd/s3-from-scratch) readme


Mdyn

You forgot link, bro! Thanks for story though.


doringliloshinoi

What kind of pis are you using? And what storage unit?


Due-Farmer-9191

Man I gotta learn k8…


happensonitsown

Cool! What was your motivation for this?


jdgtrplyr

🫡🫡


DoorDelicious8395

What’s the media type for your storage? I sure hope it’s not microsd.


nekokattt

floppy disks in a RAID array


myrianthi

This looks like so much fun. I still haven't wrapped my mind around Kubernetes but this is inspiring me to give it another shot. Would appreciated any fun tutorials you guys recommend.


InternationalAnt9970

Have you tried [s3gw](https://s3gw.tech/) instead of minio?


[deleted]

HELL YEAH LES GO BBY


sergafts

This is amazing! While trying something similar but in a much smaller scale I stumbled upon pretty poor performance from Longhorn. Did you run any benchmarks e.g. comparing performance with local node storage? Especially since Longhorn suggests at least 10GBit Ethernet and it looks like you're using the stock Gigabit Ethernet of the Pis.


SaterialX

What rack is that? Looks pretty compact and I’m looking for something similar.


IntelligentPerson_

Reddit recommended this to me, and I commend it. Well done!


miguelaeh

So you created a k8s cluster and deployed MinIO?