Stock Market Predictor Using Machine Learning (Exploratory Analysis Webapp)

Overview

I developed a stock market prediction system that uses machine learning (LSTM) to forecast stock close prices. The webapp also provides users with recent news about the selected stock along with their sentiment evaluated using a fine-tuned BERT model. This comprehensive system not only provides users to perform exploratory analysis on using LSTM when predicting stock closing prices but also offers an interactive user experience.

How to Use Demo

Simply choose a stock ticker, select a time frame, choose the list of features you want the model to train on then press load. If you would like to see the recent news and its sentiment, check ‘Load Recent News?’.

Key Features

  • Stock Predictions: Utilizes a combination of technical indicators and price-related features to predict next day closing price.
  • Advanced Sentiment Analysis: Scrapes stock headlines using Selenium and applies a fine-tuned BERT model to derive sentiment scores.
  • Dynamic Web Application: Built with Django, HTML, and JavaScript, providing real-time updates about the queue through WebSockets, and featuring a dynamic progress bar powered by celery-progress via GET requests to track long-running tasks.
  • Robust Backend Infrastructure: Ensures high performance and reliability with Nginx for reverse proxy, Celery for asynchronous tasks, Redis for caching, Supervisor for process management and Gunicorn which servers the django website and allows communication to Nginx via sockets.

Tech Stack Summary

  • Programming Languages: Python
  • Frontend Technologies: HTML, CSS, JavaScript
  • Backend Framework: Django, Python
  • Machine Learning Libraries: PyTorch (for LSTM), Transformers (for BERT)
  • Web Scraping Tools: Selenium
  • Real-Time Communication: WebSockets
  • Infrastructure Tools:
    • Nginx: Reverse proxy server for handling client requests
    • Celery: Task queue for handling asynchronous jobs
    • Redis: In-memory data store for caching and message brokering
    • Supervisor: Process control system for managing application processes, ensures the Django application stays running
    • Gunicorn: A Python WSGI HTTP server that serves your Django application, handling multiple requests simultaneously

Server and Deployment

Task Management

				
					#!/bin/bash

NAME="celery"  # Name of the Celery process
DIR=/root/projects/smp/Stock_Market_Predictor_Web_App/website  # Directory where project is located
USER=root  # User to run this script as
GROUP=root  # Group to run this script as
LOG_LEVEL=info

# Move to the project directory
cd $DIR || { echo "Directory $DIR does not exist"; exit 1; }

# Activate the virtual environment
source /root/projects/smp/env/bin/activate || { echo "Failed to activate virtual environment"; exit 1; }

# Start Celery worker
exec /root/projects/smp/env/bin/celery -A website worker --loglevel=$LOG_LEVEL --concurrency=1
				
			
Celery (Worker)

This script is a Bash script used to start a Celery worker process for a Django project. It first sets some variables like the name of the process, the directory where the project is located, and the user and group under which the script will run. The script then navigates to the project directory and activates the project’s Python virtual environment, ensuring that the correct dependencies and settings are used. Finally, it starts the Celery worker process with a specified log level and concurrency, which allows the worker to process background tasks defined in the Django project. This script is used to automate the startup of the Celery worker which allows us to start the worker through supervisor.

Celery (Beat)

Similarly to the celery worker, we configure a celery beat worker which is responsible for scheduling tasks at regular intervals. In our case, the worker deletes the task results which are stored in the database every 10 minutes as they are no longer needed once loaded to the user.  The bash scripts allows us to start the worker through supervisor.

				
					#!/bin/bash

NAME="celery_beat"  # Name of the Celery process
DIR=/root/projects/smp/Stock_Market_Predictor_Web_App/website  # Directory where project is located
USER=root  # User to run this script as
GROUP=root  # Group to run this script as
LOG_LEVEL=info

# Move to the project directory
cd $DIR || { echo "Directory $DIR does not exist"; exit 1; }

# Activate the virtual environment
source /root/projects/smp/env/bin/activate || { echo "Failed to activate virtual environment"; exit 1; }

# Start Celery worker
exec /root/projects/smp/env/bin/celery -A website beat --loglevel=$LOG_LEVEL
				
			

Redis

Redis is a commonly used message broker in Celery, alowing communication between the Celery workers and the task queue.  In addition, we use Redis to store tasks which in turn allows the queue counter to work on the website.


To get redis up and running simply follow these steps,

To get redis up and running simply follow these steps:

1. Update the package lists:

				
					sudo apt-get update
				
			

2. Install Redis:

				
					sudo apt-get install redis-server
				
			

3. Start the Redis service:

				
					sudo systemctl start redis-server

				
			

4. Enable Redis to start on boot:

				
					sudo systemctl enable redis-server

				
			

5. Ensure Redis is activate:

				
					sudo systemctl status redis-server

				
			
				
					#!/bin/bash

NAME="website"  # Django application name
DIR=/root/projects/smp/Stock_Market_Predictor_Web_App/website  # Directory where project is located
USER=root  # User to run this script as
GROUP=root  # Group to run this script as
WORKERS=1  # Number of workers that Gunicorn should spawn
SOCKFILE=unix:/root/projects/smp/sockets/gunicorn.sock  # This socket file will communicate with Nginx
DJANGO_SETTINGS_MODULE=website.settings  # Which Django setting file should use
DJANGO_WSGI_MODULE=website.wsgi  # Which WSGI file should use
LOG_LEVEL=debug

# Move to the project directory
cd $DIR || { echo "Directory $DIR does not exist"; exit 1; }

# Activate the virtual environment
source /root/projects/smp/env/bin/activate || { echo "Failed to activate virtual environment"; exit 1; }

# Export environment variables
export DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE
export PYTHONPATH=$DIR:$PYTHONPATH
echo echo $PYTHONPATH
# Start Gunicorn
exec /root/projects/smp/env/bin/gunicorn ${DJANGO_WSGI_MODULE}:application \
--name $NAME \
--workers $WORKERS \
--user=$USER \
--group=$GROUP \
--bind=$SOCKFILE \
--log-level=$LOG_LEVEL \
--log-file=-

				
			
Gunicorn

This Bash script is designed to automate the process of starting a Django application using the Gunicorn WSGI server which supervisor can run. It begins by defining several variables, including the application name, the directory where the Django project is located, the user and group under which the script will run, the number of Gunicorn worker processes, and the socket file used for communication with Nginx. The script also specifies the Django settings and WSGI modules that Gunicorn will use.

Server Setup

Supervisor

Supervisor is a process control system that allows you to manage and monitor processes on Unix-like operating systems, ensuring that critical services remain running and are automatically restarted if they fail. In this configuration, Supervisor is used to manage various components of a Django-based web application, including the Gunicorn server, Celery workers, Celery Beat scheduler, and Daphne server for handling ASGI (Asynchronous Server Gateway Interface) requests.

 

Each [program] block defines a different service, specifying the command to start the service, the user under which it should run, and various options such as automatic restart on failure and logging of standard output and error messages. The taskset -c command is used to pin these processes to specific CPU cores, optimizing performance. Supervisor ensures that these services are started on system boot (autostart=true), are automatically restarted if they crash (autorestart=true), and logs all output for debugging purposes. This setup ensures that the web application and its related services are robust, resilient, and continuously available.

Start Supervisor:

1. Update Supervisor config:

After updating the config file in ‘/etc/supervisor/conf.d’ (Make one if not there) run the following commands in terminal.

				
					sudo supervisorctl reread
sudo supervisorctl update
				
			

2. Start all the processes:

				
					sudo supervisorctl restart all
				
			

3. Check process of all the processes:

				
					sudo supervisorctl status all
				
			

You should see all the processes ‘running’ after running the command above in the terminal.

				
					[program:website]
command=taskset -c 0 /root/projects/smp/env/bin/gunicorn_configuration
user=root
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/root/projects/smp/logs/gunicorn-error.log

[program:superfsmon_website]
command=taskset -c 0 /root/projects/smp/env/bin/superfsmon /root/projects/smp/Stock_Market_Predictor_Web_App/website website -r *.py
user=root
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/root/projects/smp/logs/superfsmon.log

[program:celery]
command=taskset -c 1 /root/projects/smp/env/bin/celery_configuration
directory=/root/projects/smp/Stock_Market_Predictor_Web_App/website
user=root
autostart=true
autorestart=true
stderr_logfile=/root/projects/smp/celery-error.log
stdout_logfile=/root/projects/smp/logs/celery.log

[program:celery_beat]
command=taskset -c 0 /root/projects/smp/env/bin/celery_beat_configuration
directory=/root/projects/smp/Stock_Market_Predictor_Web_App/website
user=root
autostart=true
autorestart=true
stderr_logfile=/root/projects/smp/celery_beat-error.log
stdout_logfile=/root/projects/smp/logs/celery_beat.log

[program:daphne]
command=taskset -c 0 /root/projects/smp/env/bin/daphne -p 8001 website.asgi:application
directory=/root/projects/smp/Stock_Market_Predictor_Web_App/website
user=root
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/root/projects/smp/logs/daphne.log
stderr_logfile=/root/projects/smp/logs/daphne_error.log
				
			

Nginx

Nginx is a high-performance web server and reverse proxy server that is often used to serve static content, manage incoming web traffic, and distribute it to backend servers like Gunicorn, which runs your Django application. In this project, Nginx is used to handle incoming HTTPS requests, serve static files like CSS and JavaScript directly, and pass dynamic requests to the Gunicorn server via a Unix socket.

Configuring Nginx:

1. Update the package lists and install Nginx:

				
					sudo apt-get update
sudo apt-get install nginx

				
			

2. Start Nginx:

				
					sudo systemctl start nginx 
sudo systemctl enable nginx 
				
			

3. Create a Server Block:

				
					sudo nano /etc/nginx/sites-available/website
				
			

4. Configure the Server Block:

Add the following content (in the code block), adjusting paths as necessary. After this step you can use packages like certibot to recieve SSL certificates and allow HTTPS on your site.

				
					server {
    listen 80;
    server_name your_domain_or_IP;

    location / {
        proxy_pass http://unix:/root/projects/smp/sockets/gunicorn.sock;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /static/ {
        alias /root/projects/smp/Stock_Market_Predictor_Web_App/website/static/;
    }

    location /media/ {
        alias /root/projects/smp/Stock_Market_Predictor_Web_App/website/media/;
    }
}

				
			

5. Enable the Site:

				
					sudo ln -s /etc/nginx/sites-available/website /etc/nginx/sites-enabled/
				
			

6. Test the Configuration:

				
					sudo nginx -t

				
			

7. If succesful restart Nginx:

				
					sudo systemctl restart nginx
				
			
Nginx

This code iterates through the EEG data to detect a blink by finding a high signal (greater than 1050.0) followed by a low signal (less than 700.0). If both conditions are met and the limiter is not active, it toggles the turning mode (left or right) and prints the current direction. 

Supervisor

We chose these threshold numbers to minimise our chance of getting false positives from natural eye blinks. The user needs to intentionally blink hard for an eye blink to be detected.

				
					# Detecting Blinks
for i in range(len(eegAF7Array)):
    if float(eegAF7Array[i]) > 1050.0:  # Detect high EEG signal
        high1000 = True
        indexRemember = i
        break
if high1000:
    for i in range(indexRemember, len(eegAF7Array)):
        if float(eegAF7Array[i]) < 700.0:  # Detect subsequent low EEG signal
            low9000 = True
            break
if high1000 and low9000 and not limiter:
    turn_drive_toggle = False
    limitDrive = True
    left_right_toggle = not left_right_toggle
    print("Left" if not left_right_toggle else "Right")
				
			

More Information

For more information check out the github for this project. Feel free to contact me if you have any questions.