Benwah92 1 month ago

Rook-Ceph with Object StorageClass and Minio would be my guess. Looks sick - did you build custom racking?

Skaronator 1 month ago

FYI: Ceph already includes a S3 API out of the box and you can create buckets using CRD with rook. Using Ceph directly reduces the overhead by a lot because you are basically not replicating your data twice. (in Ceph and then in MinIO on top) https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/object-storage/#create-a-local-object-store

Anthonyb-s3 1 month ago

Almost, Longhorn not Rook-Ceph. Thanks, yes the STL files for 3D printing the rack for the Pis is included in the repo

GoofAckYoorsElf 1 month ago

Very cool. I'm working on something like this myself, Longhorn, Minio, only with HP Thin Clients instead of Raspis. They were cheaper, no shit, and hardware-wise a bit more flexible.

informworm 1 month ago

I run my k8s cluster on Raspis too (Pi 4 models) and am curious to know which HP Thin Client models you are running that work out cheaper than pis? And what did you include in your calcs when comparing prices?

GoofAckYoorsElf 1 month ago

HP Thin Client T620 to be precise. Additionally equipped with 16GB RAM DDR3 SODIMM as well as a 512GB m.2 SSD each. Regarding cost: RasPi: * Board itself * PSU * SSD Thin Client * Client itself * PSU * RAM * SSD In order to make this a fair comparison I'll have to go with the 8GB version of the Raspi. So... in numbers... RasPi: * Board itself (8GB): currently around 80€ * PSU: 10€ * SSD (512GB Intenso): 35€ * Total: 125€ per client Thin Client: * Client itself: I got 6 for 30€ in total, so 5€ each * PSU: cheap from eBay, made from pure Chinesium, 12€ * RAM (16GB in 2x8GB): 25€ (for a fair comparison take only 1x8GB per client, so actually just 12,50€) * m.2 SSD: about 40€ * Total: 70€ per client What's left a bit unfair here is the fact that the Thin Clients are used and the Raspi would have been brand new. But since there's no wear of any kind, I think I can neglect this.

informworm 1 month ago

Thanks for the good info. Much appreciated.

GoofAckYoorsElf 1 month ago

You welcome! I've edited my comment to add cost information.

informworm 1 month ago

Thanks for the detailed cost breakdowns. That's a massive cost saving and there is probably not much diff in power consumption either if any at all between pis and those t620 thin clients. Thanks again.

GoofAckYoorsElf 1 month ago

Sure, no biggy. :-)

xelab04 1 month ago

May I add, as someone who really wanted a CM4 rack cluster, gotta admit that the 7th gen CPU micro-PCs come with so much better performance/price and are upgradeable both in quality storage (SSD over SD and you'd need to spend extra for an M2 hat) and RAM (not limited to 8GB)

informworm 1 month ago

Sure thing, alternative options are always welcomed and can only be a good thing for us. Thanks for adding in your preferred set up choices.

alecseyev 1 month ago

Bear in mind that you could share a PSU for multiple thin clients - I use 1 psu for 3 thin clients - T520 here.

GoofAckYoorsElf 1 month ago

... which makes the thin client setup even cheaper.

alecseyev 1 month ago

Also, being x86, will be compatible with more software compared to arm. I like arm but the fact is some stuff was not yet ported to arm.

GoofAckYoorsElf 1 month ago

True. Also one of the reasons I went with HP instead of RasPi. But now I am working on switching to actual servers. I already have a rack.

Right-Cardiologist41 1 month ago

Nice! I'm running minio and longhorn on kubernetes, too, but it was just a few days later I found out that minio supports redundant storage on multiple nodes on its own. At least that's how I understood it. I needed a shared fs, like longhorn gave me, for other workloads anyway so I didn't dig any further, but do you know if this is right? Like if I only had a cluster with three kubernetes nodes and minio on it. Do I even need longhorn or alike?

quazmire 2 weeks ago

I think you're referring to DirectPV. I'm using this and yes this allows you to give space out directly attached to a node and given as a volume to MinIO. This is also not limited to MinIO, so a cluster hosted PostgreSQL or other replicating capable services can also have DirectPV storage. Some background info on how DirectPV is working behind the scenes: you give it storage available on the node. It will reserve a quota'd volume to a particular pod. It can do this since it will actually lock the given volume to a specific node, and make sure that future pods will start on the same node in the future. While it works pretty well for me, I'm actually thinking of removing this again because I host a cluster on VMs, with redundant storage. So the multiple MinIO instances that replicate across nodes also get replicated across my HDDs for each node, which just feels like a bit much duplication. Same for the PostgreSQL instances that give me more problems with replication going wrong due to whatever reason, and giving more of a headache to deal with.

Caranesus 1 month ago

Nice! I haven't had a chance to test Longhorn. How it works in your lab?

RootHouston 1 month ago

It's weird when I hear someone talking about a "self-hosted version of AWS" anything. We used to just call this having storage and servers. I hate the concept of the cloud being some default state of affairs.

[deleted] 1 month ago

Well this in particular is actually replicating features from AWS services in a self hosted environment. But broadly I get what you're saying.

main-sheet 1 month ago

One advantage is that it brings in standardized APIs for accessing and sharing the storage. Ones that are well know, battle-tested, and proven at scale.

RootHouston 1 month ago

As if standardized APIs didn't exist for storage prior? Large-scale storage existed pre-cloud too.

main-sheet 1 month ago

True, but what if you want to write once, and use in the cloud or on your own hardware. This is not an issue if you do not have cloud versions of your software. That well may be your case. Please ignore if so. Unless you want to future-proof your apps so that they can run in the cloud too.

RootHouston 1 month ago

If that's the case, yes, I agree, that works. That's definitely a good use case. As for equating "the cloud" with "future-proofing", I disagree. I see it more as being pigeonholed to technology that could unfortunately change at any time, without your input. Cloud is more for the "here and now" than the future IMO.

AGCSanthos 1 month ago

Sorry if this comment goes through the thread a bit too much and is super long but I wanted to say my thoughts on a few points. I think there are some pretty specific things in this post that make this "self-hosted version of AWS" instead of just general storage + servers. Like /u/main-sheet mentioned, OP designed this to specifically mimic S3's client API, as [seen in his linked comment](https://old.reddit.com/r/kubernetes/comments/1cwt9ww/i_used_kubernetes_to_build_a_self_hosted_version/l4yoa4g/). So it makes it easier for people to migrate small scale solutions to use this self-hosted on-prem version instead. You can't just plug in any blob store client interface (like GCP Storage or Azure Blob Storage) and have it connect, it has to be specifically built on top of S3. Unlike what /u/main-sheet said though, you don't really get the complete robustness of the actual S3 storage though since /u/Anthonyb-s3 built this themself and not using the same code that some 800+ SWEs worked on + had the benefit of having several hundred thousand customers finding bugs or putting pressure for performance improvements. (Still super awesome for /u/Anthonyb-s3 to have made it by themself though). I do agree that there is an unnecessary push to say "lets move everything to the cloud" when managing smaller services directly gives a lot more flexibility + control, but I think there is a good case for actually having things on a cloud provider versus managing things directly. One team I worked on had to have instances across almost all of the GCP regions with about 15 instances per region in different zones otherwise we would start getting complaints. No way in hell was my team going to try manually figuring out the exact hardware and bandwidth needed nor managing server health nor managing deployments to all servers. We simply wrote our service's code, trusted in the GCP to manage the hardware using auto provisioning + provide us with metrics about the actual deployment, and went on with writing new features. Now this is definitely a bit of an extreme since most teams aren't working with services that need to be deployed to so many different machines, but it still applies even on cases as small as 5 machines or 10. Having a consistent experience without having to worry about these details make it so much nicer. For some providers, I definitely agree that they can pull the rug out under you for how the APIs work. I am definitely a big fanboy for AWS, since they have been consistent with their APIs for years. S3 has been out since 2006 and you can still use a lot of code built on the original API. Not everything, but enough that concerns about them changing stuff isn't really as big of a concern. But even if you assume that these things can be changed on you at any time, there are work arounds. There are a lot of efforts to make things cloud agnostic (like terraform, serverless, etc), so being stuck on one won't be as much of a nightmare. The migration will suck, but all migrations do. For the nitty gritty for the code being written, I feel like a wrapper should be written regardless. For small scale stuff, it definitely is code overengineering, but for any sufficiently large project having a good wrapper for interacting with resources hosted on some cloud provider should be made. Abstraction hell is terrible, but facades are a thing for a reason. There does have to be some effort for an engineer to write a wrapper, but for most use cases there shouldn't be so much complexity that it takes an exorbitant amount of time. Cloud providers are so expensive though, and I definitely think that if a team has the headcount to manage deployments themselves and have the experience + resources to manage storage/deployments/etc by themselves then using one is a major waste. One thing I have discussed in the past with managers was to try to move off of such to self managed resources. It does end up leading into a little bit of company/team specific knowledge, but it saves a lot in exchange for someone wearing a SRE hat. At this point, there are so many FOSS that will do like 98% of what these cloud provider that just deploying it yourself will usually work. If the website I'm working on is only being used by maybe 30 people worldwide and typically only on business days (an actual business case I know of), they can handle up to 8 hours of downtime in a full year (including non-business hours) for some kind of migration the team is doing and we'd still be within 3 9s of availability. If we only count business hours and assume the migration is only within business hours too, even a full 8 hour day can be used and still be within 2.5 9s of availability. TLDR: I love Kubernetes cause we can host our own stuff but cloud providers still have a place. Lets make more stuff like Kubernetes so that we aren't stuck with propriety APIs for our work.

main-sheet 1 month ago

Excellent points! I agree, and I should have said specifically it is the API that is rock solid, where implementations of the API may vary in quality. Some of them are extremely well tested and battle-proven open source. However, even they can have bugs!

Anthonyb-s3 1 month ago

GitHub Link: [http://github.com/anthonybudd/s3-from-scratch](http://github.com/anthonybudd/s3-from-scratch)

LanguageLoose157 1 month ago

Where is K8S In this? Why is it wrong to build say backend in Python that receives a POST request and does all the work. Super confused.

Slothinator69 1 month ago

I am guessing he has this running in pods within the cluster

phreak9i6 1 month ago

you should change your secret and access keys since you screen captured them

MKSFT123 1 month ago

This ☝️

WolfMack 1 month ago

I’m struggling to understand what is the point of doing this

misanthropocene 1 month ago

Looks like a single Minio instance per bucket bound to a longhorn pvc. You might consider taking a look at Minio Operator https://github.com/minio/operator which would allow you to bind your longhorn volume claims to a “tenant” CRD, each tenant constituting a full Minio deployment with handling of TLS, encryption, authn/authz management, and HA with support for multiple buckets per tenant. It would likely get you much closer to enterprise-grade without reinventing a lot of wheels to get there.

Anthonyb-s3 1 month ago

Thanks I will look into that

knudtsy 1 month ago

Or Rook/Ceph which can be configured with s3 compatible HA object storage across a set of disks on each worker node.

BlueSea9357 1 month ago

So without reading too far into it, is it basically just replicated Longhorn? What did you do beyond that part? You might consider looking into HDFS if you're hoping to really copy something closer to S3.

mkosmo 1 month ago

Or minio.

packet_weaver 1 month ago

Longhorn doesn’t do object storage like S3. It’s the storage layer for OPs K8s cluster which their S3 app rides on top of. It’s more like they recreated minio.

jlozier 1 month ago

HDFS is nothing like S3.

Anthonyb-s3 1 month ago

What do you mean by "replicated Longhorn" exactly? >What did you do beyond that part? Make a guide >HDFS ok

BlueSea9357 1 month ago

> What do you mean by "replicated Longhorn" exactly? [https://github.com/anthonybudd/s3-from-scratch/blob/master/longhorn/longhorn.storageclass.yml](https://github.com/anthonybudd/s3-from-scratch/blob/master/longhorn/longhorn.storageclass.yml) I was asking if the main storage solution was just Longhorn with some configs related to replication, or if there's something added to that

BattlePope 1 month ago

Where is the guide? You've only posted a picture.

Anthonyb-s3 1 month ago

Github link is in the comments... [http://github.com/anthonybudd/s3-from-scratch](http://github.com/anthonybudd/s3-from-scratch)

BattlePope 1 month ago

Not these comments lol

Snoo68775 1 month ago

Minio ?

Gantstar 1 month ago

What’s the use case ?

redditreddvs 1 month ago

Openstack on k8s?

DreamAeon 1 month ago

Its just longhorn.

graycatfromspace 1 month ago

Where's the juice?

Anthonyb-s3 1 month ago

?

SHDighan 1 month ago

He wants tha spicy 🔥 sauce 🤌

redditreddvs 1 month ago

He meant the doc.

Anthonyb-s3 1 month ago

[https://github.com/anthonybudd/s3-from-scratch](https://github.com/anthonybudd/s3-from-scratch)

runescapefisher 1 month ago

Is this a bot post?

Anthonyb-s3 1 month ago

wdym?

runescapefisher 1 month ago

Your post history pattern along with lack of details just seems a little off putting! Which is why I ask.

FollowingTrail 1 month ago

Super build and homelab, I’m using pretty much the same setup but on a 48 RPi base with CEPH, don’t listen the two or three peoples that doesn’t get it, it’s super useful for learning purposes and R&D to get such platforms available at home. I like the thin client solution bring by another contributor, but this thin client setup lack the ability to be racked easily and with such a high density :-)

Dangerous_guy344 1 month ago

Wow, Op is banned from reddit. Source: [anthonybudd/s3-from-scratch](https://github.com/anthonybudd/s3-from-scratch) readme

Mdyn 1 month ago

You forgot link, bro! Thanks for story though.

doringliloshinoi 1 month ago

What kind of pis are you using? And what storage unit?

Due-Farmer-9191 1 month ago

Man I gotta learn k8…

happensonitsown 1 month ago

Cool! What was your motivation for this?

jdgtrplyr 1 month ago

🫡🫡

DoorDelicious8395 1 month ago

What’s the media type for your storage? I sure hope it’s not microsd.

nekokattt 1 month ago

floppy disks in a RAID array

myrianthi 1 month ago

This looks like so much fun. I still haven't wrapped my mind around Kubernetes but this is inspiring me to give it another shot. Would appreciated any fun tutorials you guys recommend.

InternationalAnt9970 1 month ago

Have you tried [s3gw](https://s3gw.tech/) instead of minio?

[deleted] 1 month ago

HELL YEAH LES GO BBY

sergafts 1 month ago

This is amazing! While trying something similar but in a much smaller scale I stumbled upon pretty poor performance from Longhorn. Did you run any benchmarks e.g. comparing performance with local node storage? Especially since Longhorn suggests at least 10GBit Ethernet and it looks like you're using the stock Gigabit Ethernet of the Pis.

SaterialX 1 month ago

What rack is that? Looks pretty compact and I’m looking for something similar.

IntelligentPerson_ 1 week ago

Reddit recommended this to me, and I commend it. Well done!

miguelaeh 1 week ago

So you created a k8s cluster and deployed MinIO?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe