Projects Related To Data Analytics
Posted : admin On 19.07.2019Aug 1, 2018 - Side projects are awesome for plenty of reasons. A few relevant examples could include: Machine learning and modeling; Exploratory data analysis; Metrics and experimentation; Data visualization and communication. The Big Data & Business Analytics Group is engaged in various exploratory, and funded research and development projects addressing key issues in big data and analytics and applications of big data techniques for solving real-life problems. The group has a sizable pool of data scientists, research students. Data Pitch (H2020-ICT14) Data Pitch creates a transnational, Europe-wide data innovation ecosystem that brings together data owners and Big Data technology providers with startups and SMEs with fresh ideas for data-driven products and services. Knowledge from Data Pitch will help in developing the Big Data Healthcare Analytics Blueprint.
Data for a predictive analytics project can come from many different sources. Some of the most common sources are within your own organization; other common sources include data purchased from outside vendors.
Internal data sources include
Transactional data, such as customer purchases
Customer profiles, such as user-entered information from registration forms
Campaign histories, including whether customers responded to advertisements
Clickstream data, including the patterns of customers’ web clicks Forza horizon 2 opening song.
Customer interactions, such as those from e-mails, chats, surveys, and customer-service calls
Machine-generated data, such as that from telematics, sensors, and smart meters
External data sources include
Social media such as Facebook, Twitter, and LinkedIn
Subscription services such as Bloomberg, Thompson Reuters, Esri, and Westlaw
By combining data from several disparate data sources in your predictive models, you may get a better overall view of your customer, thus a more accurate model.
Certificate : https://graduation.udacity.com/confirm/KUM3F4AJ
This repository is mainly for projects I have done under Udacity-Data-Analysis-Nanodegree.
Udacity online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .
Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.
Part 1 - Intro to Data Analysis
Subjects Covered:
- Anaconda: Learn to use Anaconda to manage packages and environments for use with Python
- Jupyter Notebook: Learn to use this open-source web application
- Data Analysis Process
- NumPy for 1 and 2D Data
- Pandas Series and Dataframes
Project 1: Explore Weather Trends with weather forecast data
In this project, I choose one of Udacity's curated datasets and investigate it using NumPy and pandas.I complete the entire data analysis process, starting by posing a question and finishing by sharing the findings.( It may be better to place this section inside the readme of the project 1)
Project 2: Investigate a dataset called TMDb movie data.
I was provided a dataset reflecting data collected from an experiment. I used statistical techniques to answer questions about the data and report my conclusions and recommendations in a report.
Part 2 -Practical Statistics
Data Analytics Examples In Business
Subjects Covered:
- Probability
- Conditional Probability
- Binominal Distribution
- Sampling Distribution and Central Limit Theorem
- Descriptive Statistics
- Inferential Statistics
- Confidence Levels and Intervals
- Hypothesis Testing
- T-tests and A/B test
- Regression
- Multiple Linear Regression
- Logistic Regression
Project 3: Analyze A/B Test Results with company ab_data.csv
Using Python, I gathered data from a variety of sources, assess its quality and tidiness, then clean it. I documented the wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python and SQL.By using AB Testing and regression methods to decide if the company should launch a new webpage or keep the old one.
Part 3 - Data Extraction and Wrangling
Subjects Covered:
- GATHERING DATA:
- Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs
- Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files
- Store gathered data in a PostgreSQL database
- ASSESSING DATA
- Assess data visually and programmatically using pandas
- Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
- Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
- CLEANING DATA
- Identify each step of the data cleaning process (defining, coding,and testing)
- Clean data using Python and pandas
- Test cleaning code visually and programmatically using Python
Project 4 : Data Wrangle and Analyze with Tweet WeRateDogs data
Collect data from different sources and assess data visually and programmatically , clean data for visulizing data and finding insights later.
Part 4 - Data Visualisation
Subjects Covered:
- Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales )
- Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting )
- Multivariate exploration of data ( encodings , plot matrices , feature enginnering )
- Explanatory Visulizations ( story telling with data , polish plots , create slide deck )
Project 5: Data Visulization with Diamond Data
Data visualization to a dataset involving the characteristics of diamonds and their prices.
Project 6: Communicate data finding with Ford Go Bike Sharing Data
In this project, I used Python’s data visualization tools to systematically explore the bike dataset forits properties and relationships between variables. Then, I created a presentation that communicates the findings to others.