Environment Quick Summary#

This environment is a Gymnasium environment designed for trading on a single pair.

Action Space	`Discrete(number_of_positions)`
Observation Space	`Box(-np.inf, +np.inf, shape=...)`
Import	`gymnasium.make("TradingEnv", df=df)`

Important Parameters#

df (required): A pandas.DataFrame with a close and DatetimeIndex as index. To perform a render, your DataFrame also needs to contain open, low, and high.
positions (optional, default: [-1, 0, 1]): The list of positions that your agent can take. Each position is represented by a number (as described in the Action Space section).

Documentation of all the parameters

Action Space#

The action space is a list of positions given by the user. Every position is labeled from -inf to +inf and corresponds to the ratio of the portfolio valuation engaged in the position ( > 0 to bet on the rise, < 0 to bet on the decrease).

Example with BTC/USDT pair (%pv means *“Percent of the Portfolio Valuation”*)#
Position examples	BTC (%pv)	USDT (%pv)	Borrowed BTC (%pv)	Borrowed USDT (%pv)
0		100
1	100
0.5	50	50
2	200			100
-1		200	100

If position < 0: the environment performs a SHORT (by borrowing USDT and buying BTC with it).

If position > 1: the environment uses MARGIN trading (by borrowing BTC and selling it to get USDT).

Observation Space#

The observation space is an np.array containing:

The row of your DataFrame columns containing features in their name, at a given step : the static features
The dynamic features (by default, the last position taken by the agent, and the current real position).

>>> df["feature_pct_change"] = df["close"].pct_change()
>>> df["feature_high"] = df["high"] / df["close"] - 1
>>> df["feature_low"] = df["low"] / df["close"] - 1
>>> df.dropna(inplace= True)
>>> env = gymnasium.make("TradingEnv", df = df, positions = [-1, 0, 1], initial_position= 1)
>>> observation, info = env.reset()
>>> observation
array([-2.2766300e-04,  1.0030895e+00,  9.9795288e-01,  1.0000000e+00], dtype=float32)

If the windows parameter is set to an integer W > 1, the observation is a stack of the last W states.

>>> env = gymnasium.make("TradingEnv", df = df, positions = [-1, 0, 1], initial_position= 1, windows = 3)
>>> observation, info = env.reset()
>>> observation
array([[-0.00231082,  1.0052915 ,  0.9991996 ,  1.        ],
       [ 0.01005705,  1.0078559 ,  0.98854125,  1.        ],
       [-0.00408145,  1.0069852 ,  0.99777853,  1.        ]],
       dtype=float32)

Reward#

The reward is given by the formula \(r_{t} = ln(\frac{p_{t}}{p_{t-1}})\text{ with }p_{t}\text{ = portofolio valuation at timestep }t\) . It is highly recommended to customize the reward function to your needs.

Starting State#

The environment explores the given DataFrame and starts at its beginning.

Episode Termination#

The episode finishes if:

1 - The environment reaches the end of the DataFrame, truncated is returned as True 2 - The portfolio valuation reaches 0 (or bellow). done is returned as True. It can happen when taking margin positions (>1 or <0).

Arguments#

class gym_trading_env.environments.TradingEnv(df: ~pandas.core.frame.DataFrame, positions: list = [0, 1], dynamic_feature_functions=[<function dynamic_feature_last_position_taken>, <function dynamic_feature_real_position>], reward_function=<function basic_reward_function>, windows=None, trading_fees=0, borrow_interest_rate=0, portfolio_initial_value=1000, initial_position='random', max_episode_duration='max', verbose=1, name='Stock', render_mode='logs')#

An easy trading environment for OpenAI gym. It is recommended to use it this way :

import gymnasium as gym
import gym_trading_env
env = gym.make('TradingEnv', ...)

Parameters

df (pandas.DataFrame) – The market DataFrame. It must contain ‘open’, ‘high’, ‘low’, ‘close’. Index must be DatetimeIndex. Your desired inputs need to contain ‘feature’ in their column name : this way, they will be returned as observation at each step.
positions (optional - list[int or float]) – List of the positions allowed by the environment.
dynamic_feature_functions (optional - list) –
The list of the dynamic features functions. By default, two dynamic features are added :
- the last position taken by the agent.
- the real position of the portfolio (that varies according to the price fluctuations)
reward_function (optional - function<History->float>) – Take the History object of the environment and must return a float.
windows (optional - None or int) – Default is None. If it is set to an int: N, every step observation will return the past N observations. It is recommended for Recurrent Neural Network based Agents.
trading_fees (optional - float) – Transaction trading fees (buy and sell operations). eg: 0.01 corresponds to 1% fees
borrow_interest_rate (optional - float) – Borrow interest rate per step (only when position < 0 or position > 1). eg: 0.01 corresponds to 1% borrow interest rate per STEP ; if your know that your borrow interest rate is 0.05% per day and that your timestep is 1 hour, you need to divide it by 24 -> 0.05/100/24.
portfolio_initial_value (float or int) – Initial valuation of the portfolio.
initial_position (optional - float or int) – You can specify the initial position of the environment or set it to ‘random’. It must contained in the list parameter ‘positions’.
max_episode_duration (optional - int or 'max') – If a integer value is used, each episode will be truncated after reaching the desired max duration in steps (by returning truncated as True). When using a max duration, each episode will start at a random starting point.
verbose (optional - int) – If 0, no log is outputted. If 1, the env send episode result logs.
name (optional - str) – The name of the environment (eg. ‘BTC/USDT’)