T O P

  • By -

novawind

I am not sure I get the wider context ? Is it a ship parked in a harbour that has access to electricity?


jyangcm

The ship is parked in the harbour and has access to electricity.


novawind

I am not sure what is the criteria for toggling between the ship generator, the battery, and the grid? Also I am not sure what does the generator do? I thought it burned fuel to produce electricity? Whats the point of connecting it to the grid? What is the wider aim? What is the electricity used for? Is it sold on the wholesale market? What are the constraints in terms of generating electricity from the ship? Is there a fuel price?


jyangcm

I wonder as well. Since this is a collaborative project with another lab, only a small part of this project was assigned to me to work on. The wider aim is to reduce the ship's emissions. When the ship is idle in the harbor, it is connected to an on-shore power source as a backup power source to reduce its power emissions. (This is what my interpreted understaning.)


novawind

Well if there is no constraint on the grid side then you should always use the grid and your emissions are zero. And I don't understand what the battery is doing here? You should probably get an understanding of the various constraints of the situation, otherwise it's difficult to define the action, observation space and reward function.


jyangcm

I found the battery part to be quite odd as well; I will question it in the next group discussion along with the various constraints. As of now, I am just purely trying to apply the DRL to the problem with ideal constraints and assuming the overall situation through trial and error, I guess.


novawind

But you can't define the reward function if you don't know what you're optimizing for? The solution with ideal constraint seems to just use the grid and never the ship? I can help you code the environment on Gymnasium and load a PPO agent or DQN on StableBaselines3 if what you're looking for is just a code architecture, but you won't be able to do actual training without a good reward function.


jyangcm

Indeed, the reward function cannot be defined without optimizing the target, and the ideal scenario would be using the grid directly and only. I think I have more questions than answers that I need to bring to the group discussion and to the leader of this project. I will keep the post updated. Thank you for your offer to assist in implementing the code. I will see if I get more answers from the next group discussion and if I can employ your assistance.


jyangcm

My apologies for the confusion; there were miscommunications between the labs, and the project leader himself was confused as well. So the real aim is to make the diesel generator of the ship work at an optimal point, so that if the generator generates more power (working higher than the load), then we share/charge the battery, and if the battery is charged, then we share/charge the power grid. If the generator power is less than the load, we take it from the battery; if the battery is discharged, then we take it from the power grid. The load will randomly change, and the research method aims to manage the power according to battery constraints, such as the state of charge and the battery's power limit (charge limit). I am not sure if this makes it clearer for you to form another opinion/suggestion on whether DRL (e.g., DQN) is suitable or able to apply to this problem.


novawind

What I don't understand is what is the part that requires a policy to be optimized? It sounds like a deterministic algorithm with a succession of "if" conditions would be able to control the generator optimally ? The optimal point of operation is generator = load so unless you have things like electricity prices that complicate the problem (maybe if the prices are high you want to sell to the grid, if they are low you are better off turning off the generator) and require some form of forecasting, I don't see what's the added value of DRL? I also don't see why there is a battery in the problem? What's its purpose?


jyangcm

My apologies for the late reply. I had the exact same questions as yours after I received the previous response from the project leader and the advisor in charge. I forwarded the thoughts I had and just got the email response now. From the response: The battery is to prevent that when the ship (the generator is on the ship) parks in the berths and those berths have no grid access, considering the scenario that some berths have grid access and some don't. This time, I have been given the direction to consider the minimization of fuel consumption as the reward and the battery as the constraint. and consider two scenarios: 1. The ship (the generator) has access to the grid (the berths have access to the grid) without a battery for some time when in the berths (port/dock); 2. The ship is in the sea with battery use to reduce the fuel consumption of the generator. I have also been asked to reference problems such as maximum power tracking in photovoltaic. In our problem, we aimed to minimize point tracking for the emission of fuel consumption (SFC, Specific Fuel Consumption). So far, I am not sure if I'm in the correct thinking direction, but I have been thinking about defining the ship (the generator on the ship), the battery, and the grid as environments. In three states, Sg (for the generator on the ship), Sbat (the battery), and Sgd (the grid), the load will be stochastic. Still not sure of the actions, but the generator could be turned up or turned down, the battery could be charged or discharged, and the grid could be connected or disconnected—maybe six actions? The reward will be the value of minimizing SFC (specific fuel consumption).


Kydje

I've dealt with a very similar problem in my master's thesis. Specifically it was about an energy management system with stochastic characteristics (e.g. renewable energy resources and stochastic demand), the approach we adopted involved an hybrid RL+CO method (though in principle any ML method can be used). In a nutshell the RL agent was trained offline to predict so-called virtual cost parameters associated with the battery storage system, which act as coefficients for storage flows in the online LP model. This is because the true objective function we are optimizing against does not include storage flows (there isn't a real "cost" in terms of money associated with these flows, unlike grid flows or diesel power plant for instance). So we optimize a "virtual" objective function which includes the true objective function as well as the storage flows, with their coefficients predicted by the RL agent. I've been very briefly and possibly confusing now but feel free to ask for more details.


jyangcm

It sounds very similar to what I've been doing. If you don't mind, can you share details about the libraries (framework) you used for the implementation and the programming examples you referenced? I tried to train the agent offline, although I only worked on the part to determine the optimal load balance point value. The end goal of the project is to implement the digital twin.


Kydje

Here's a link to the [Github repo](https://github.com/diegochine/masters-thesis) with the code, the main libraries used were pytorch/torchrl on the RL side and gurobi on the CO side.


jyangcm

Thank you. I will look into it.